Zipformer Japanese for Mac Dictation: A 2026 Guide
Zipformer is one of the more efficient open-source ASR architectures of the last few years, and a Japanese Zipformer model trained on the ReazonSpeech corpus is a popular pick for offline Japanese transcription. If you're looking at offline Japanese dictation on Mac, here's what the model is and what we ship in Resonant for Japanese instead.
What Zipformer is
Zipformer is an encoder architecture introduced by the k2/icefall project — a successor to Conformer designed to be both faster and more accurate at the same parameter count. It's widely used through the sherpa-onnx runtime, where pre-trained Zipformer checkpoints are available for many languages.
The Japanese checkpoint most people use is trained on ReazonSpeech, an open Japanese speech dataset of roughly 35,000 hours sourced from broadcast TV captions. The result is a compact (~148 MB) Japanese ASR model that runs locally and posts strong WER numbers on Japanese benchmarks.
Where Zipformer Japanese is the right choice
If you're building a Japanese-only transcription pipeline — subtitling, batch transcription, self-hosted speech-to-text — the Zipformer Japanese checkpoint via sherpa-onnx is a solid open choice. The model is small, the runtime is portable, and the license is permissive.
Why Resonant ships Qwen3 ASR for Japanese
Resonant is a real-time, system-wide dictation app, not a transcription pipeline. The people we hear from who dictate in Japanese also dictate in English — in the same email, often in the same sentence — and they want one model that handles both with no settings flip.
We ship Alibaba's Qwen3 ASR 0.6B for multilingual dictation, including Japanese. It covers 30+ languages, handles Japanese-English code-switching as a first-class case, and is compiled to CoreML to run on Apple Neural Engine. A dedicated Japanese Zipformer can match or beat it on Japanese-only benchmarks, but Qwen3 ASR's breadth and code-switching fluency are why we picked it for live dictation.
Read more about Qwen3 ASR in Resonant.
The short version
The ReazonSpeech-trained Zipformer is a great open Japanese ASR model if you're building Japanese-only transcription. For live dictation across Japanese and English on Mac — Resonant ships Qwen3 ASR on Apple Neural Engine, fully on-device, no cloud.
Download Resonant to try Japanese dictation on your Mac.