Which Transcription Model Should You Use in Resonant?
Resonant ships with ten different speech-to-text models. They all run locally on your Mac — no cloud, no account, nothing leaves the machine. But they are not all the same, and picking the right one makes a real difference.
Some are tuned for accuracy in English. Some are built for speed. Some are purpose-built for Mandarin, Japanese, Korean, or Russian. One covers over 1,600 languages. Here's what each one is and when to use it.
Start here: Parakeet TDT 0.6B v3
Best for: English and European languages. Recommended for most people.
Size: ~640 MB — Languages: English + 24 European languages
Parakeet is the default model in Resonant for a reason. It was built by NVIDIA on NeMo FastConformer architecture, trained on over 660,000 hours of audio, and it shows. Word error rates on English, German, Spanish, Italian, and French are among the lowest of any locally-runnable model.
It auto-detects language across 25 European languages — you don't need to tell it what you're speaking. German comes in at 5.04% WER on FLEURS benchmarks. Spanish at 3.45%. Italian at 3.00%. For English dictation specifically, it's hard to beat at any price.
Parakeet also supports hotwords: you can bias it toward proper nouns, technical terms, or product names that matter to your work. If you dictate anything involving names, jargon, or specialized vocabulary, that feature alone makes it the right starting point.
If you speak English or any Western European language and you're not sure which model to use, use Parakeet. Switch only if you have a specific reason.
For lightweight English use: Moonshine v2 Medium
Best for: English-only workflows where you want a smaller footprint.
Size: ~200 MB — Languages: English only
Moonshine was built by Useful Sensors specifically for edge devices. The v2 Medium model runs at 245M parameters — about a third the size of Parakeet — and delivers accuracy that holds up well for everyday dictation. 6.65% word error rate on standard benchmarks.
The key difference from Whisper-based models is architecture. Moonshine doesn't pad audio to fixed 30-second chunks like Whisper does, which means shorter utterances process without unnecessary overhead. It was designed to be efficient, and on Apple Silicon that efficiency is noticeable.
Choose Moonshine v2 Medium if you dictate in English, you want to keep your model download small, and you don't need hotwords or multilingual support.
For 99 languages: Whisper Large V3 Turbo
Best for: Any language not served by a dedicated model above.
Size: ~1 GB — Languages: 99 languages
OpenAI's Whisper is the model that proved high-accuracy offline transcription was possible at scale. Whisper Large V3 Turbo is a distilled version — 809M parameters with only 4 decoder layers instead of the full 32 — which makes it significantly faster while keeping the broad language support intact.
If you dictate in Arabic, Hindi, Vietnamese, Thai, Hebrew, Turkish, Indonesian, or any of the other 80+ languages that Parakeet and SenseVoice don't cover, Whisper Turbo is your model. It defaults to English but you can point it at any of the 99 supported languages in Resonant's settings.
It's the most widely trusted offline transcription model in the world. If you're evaluating Resonant against another tool that uses Whisper in the cloud, this is what that tool is running — except yours runs locally.
For faster English Whisper: Whisper Distil Large v3.5
Best for: English-only Whisper users who want more speed.
Size: ~1 GB — Languages: English only
Distil-Whisper is a knowledge-distilled version of Whisper Large V3, trained specifically on English. The accuracy on English short-form audio is within about 1% of the full Turbo model, but it runs 1.5x faster.
If your work is English-only and you find yourself transcribing frequently throughout the day, that speed difference adds up. The download size is comparable to Whisper Turbo, so the only tradeoff is losing multilingual support you may not need.
For East Asian languages: SenseVoice Small
Best for: Chinese, Japanese, Korean, and Cantonese.
Size: ~226 MB — Languages: Mandarin, English, Japanese, Korean, Cantonese
SenseVoice was built by Alibaba Research for exactly this use case. It uses a non-autoregressive CTC architecture that processes audio at roughly one-tenth real-time — RTF of 0.10, which means a ten-second clip takes about one second to transcribe. That's very fast.
It auto-detects across its five supported languages, so mixed-language dictation between, say, Mandarin and English works without switching modes. At 226 MB, the download is light for what it covers.
If you regularly switch between English and any East Asian language, SenseVoice is the right choice. For Mandarin-primary users who need the highest possible accuracy on Chinese, consider FireRedASR Large instead (below).
For Mandarin accuracy: FireRedASR Large
Best for: Mandarin-primary speakers who prioritize accuracy above all else.
Size: ~1.7 GB — Languages: Mandarin Chinese + English
FireRedASR Large is the best locally-runnable Mandarin model available. Built by the FireRed team on an attention encoder-decoder architecture, it achieves 3.18% character error rate on Mandarin benchmarks — state of the art for an offline model. It also handles Chinese dialects and code-switching between Mandarin and English.
The download is larger at 1.7 GB, and it runs somewhat slower than SenseVoice. But if your primary use is professional Mandarin dictation — documents, correspondence, clinical notes — the accuracy difference is worth it.
For Japanese: Zipformer Japanese
Best for: Japanese-only speakers who want the highest accuracy.
Size: ~148 MB — Languages: Japanese only
This model was trained on 35,000 hours of ReazonSpeech v2.0 data — one of the largest Japanese speech corpora publicly available. The Icefall Zipformer architecture runs at RTF 0.08, which means real-time-or-better transcription on any modern Mac.
SenseVoice covers Japanese, but if Japanese is your primary or only language, this dedicated model will generally outperform it. 148 MB is a compact download for what it delivers.
For Korean: Zipformer Korean
Best for: Korean speakers.
Size: ~68 MB — Languages: Korean only
At 68 MB, Zipformer Korean is the smallest model in Resonant. It runs at 29x faster than real-time (RTF 0.034) — the fastest model in the lineup. A minute of speech processes in about two seconds on Apple Silicon.
If you dictate in Korean, this is the right choice. The download is negligible and the speed is unmatched.
For Russian: GigaAM v2 Russian
Best for: Russian-primary speakers.
Size: ~231 MB — Languages: Russian only
GigaAM v2 comes from SaluteSpeech, Sber's speech AI research team. It's commercially licensed and uses a NeMo transducer architecture trained specifically on Russian speech. It's the best locally-runnable Russian ASR model available.
If Russian is your primary dictation language, this model will significantly outperform Whisper Turbo on Russian content. 231 MB is compact for the coverage it provides.
For everything else: omniASR 300M
Best for: Any language not covered above.
Size: ~348 MB — Languages: 1,600+ languages
Meta's omniASR is a CTC model trained across over 1,600 languages. If you speak a language that isn't served by any of the dedicated models above and isn't in Whisper's 99-language set, omniASR is your option.
It covers many low-resource languages that no other local model does. Accuracy on well-represented languages is good; on lower-resource ones it varies, as it does with any model trained on limited data. But for languages with no other local option, it's a meaningful baseline.
Quick reference
| Model | Size | Best for |
|---|---|---|
| Parakeet TDT 0.6B v3 | 640 MB | English + 24 European languages. Start here. |
| Moonshine v2 Medium | 200 MB | English only. Lightweight alternative to Parakeet. |
| Whisper Large V3 Turbo | 1 GB | 99 languages. Use for languages not covered above. |
| Whisper Distil Large v3.5 | 1 GB | English only. Faster than Turbo, same accuracy. |
| SenseVoice Small | 226 MB | Chinese, Japanese, Korean, Cantonese, English. |
| FireRedASR Large | 1.7 GB | Mandarin-primary. Best Chinese accuracy. |
| Zipformer Japanese | 148 MB | Japanese-only. Trained on 35k hours. |
| Zipformer Korean | 68 MB | Korean-only. Smallest and fastest model. |
| GigaAM v2 Russian | 231 MB | Russian-only. Best Russian accuracy. |
| omniASR 300M | 348 MB | 1,600+ languages. Universal fallback. |
How to switch models
Open Resonant's settings, go to the Transcription tab, and select a model from the list. Models that aren't downloaded yet will show a download button. The download happens in the background; Resonant will continue working with your current model while the new one downloads.
You can switch models at any time without restarting the app. If you're unsure where to start, Parakeet is already selected by default. Try it for a week. If you find yourself dictating in a language it doesn't cover, or if you want to compare accuracy on specific content, download a second model and switch between them.
All of this runs on your Mac. No account required. No audio sent anywhere. The model files live in your home directory and are yours to keep.
Download Resonant and pick your model.