Resonant
Back to resources
GuideMar 3, 2026
Share

Local Transcription Models in 2026: Parakeet, Whisper, and More

Resonant uses NVIDIA's Parakeet on Apple Silicon — a state-of-the-art on-device STT model tuned for low-latency live dictation. Audio never leaves your Mac.

Below is a field guide to the broader landscape of open local speech-to-text models in 2026 — Parakeet, Whisper, Moonshine, SenseVoice, and others — for context on why Parakeet is a good fit for live dictation on Apple Silicon today. The sections below describe each model on its own merits; Resonant itself ships only Parakeet.

Start here: Parakeet TDT 0.6B v3

Best for: English and European languages. Recommended for most people.
Size: ~640 MB — Languages: English + 24 European languages

Parakeet is the model Resonant ships. It was built by NVIDIA on NeMo FastConformer architecture, trained on over 660,000 hours of audio, and it shows. Word error rates on English, German, Spanish, Italian, and French are among the lowest of any locally-runnable model.

It auto-detects language across 25 European languages — you don't need to tell it what you're speaking. German comes in at 5.04% WER on FLEURS benchmarks. Spanish at 3.45%. Italian at 3.00%. For English dictation specifically, it's hard to beat at any price.

Parakeet also supports hotwords: you can bias it toward proper nouns, technical terms, or product names that matter to your work. If you dictate anything involving names, jargon, or specialized vocabulary, that feature alone makes it the right starting point.

If you speak English or any Western European language and you're not sure which model to use, use Parakeet. Switch only if you have a specific reason.

For lightweight English use: Moonshine v2 Medium

Best for: English-only workflows where you want a smaller footprint.
Size: ~200 MB — Languages: English only

Moonshine was built by Useful Sensors specifically for edge devices. The v2 Medium model runs at 245M parameters — about a third the size of Parakeet — and delivers accuracy that holds up well for everyday dictation. 6.65% word error rate on standard benchmarks.

The key difference from Whisper-based models is architecture. Moonshine doesn't pad audio to fixed 30-second chunks like Whisper does, which means shorter utterances process without unnecessary overhead. It was designed to be efficient, and on Apple Silicon that efficiency is noticeable.

Choose Moonshine v2 Medium if you dictate in English, you want to keep your model download small, and you don't need hotwords or multilingual support.

For 99 languages: Whisper Large V3 Turbo

Best for: Any language not served by a dedicated model above.
Size: ~1 GB — Languages: 99 languages

OpenAI's Whisper is the model that proved high-accuracy offline transcription was possible at scale. Whisper Large V3 Turbo is a distilled version — 809M parameters with only 4 decoder layers instead of the full 32 — which makes it significantly faster while keeping the broad language support intact.

If you dictate in Arabic, Hindi, Vietnamese, Thai, Hebrew, Turkish, Indonesian, or any of the other 80+ languages that Parakeet and SenseVoice don't cover, Whisper Turbo is worth knowing about. It defaults to English but supports 99 languages, and runs well via whisper.cpp on Apple Silicon.

It's the most widely trusted offline transcription model in the world. Resonant doesn't bundle Whisper today, but if you're comparing Resonant against another tool that uses Whisper in the cloud, Whisper Turbo is what that tool is effectively running — and you can run it locally yourself via whisper.cpp.

For faster English Whisper: Whisper Distil Large v3.5

Best for: English-only Whisper users who want more speed.
Size: ~1 GB — Languages: English only

Distil-Whisper is a knowledge-distilled version of Whisper Large V3, trained specifically on English. The accuracy on English short-form audio is within about 1% of the full Turbo model, but it runs 1.5x faster.

If your work is English-only and you find yourself transcribing frequently throughout the day, that speed difference adds up. The download size is comparable to Whisper Turbo, so the only tradeoff is losing multilingual support you may not need.

For East Asian languages: SenseVoice Small

Best for: Chinese, Japanese, Korean, and Cantonese.
Size: ~226 MB — Languages: Mandarin, English, Japanese, Korean, Cantonese

SenseVoice was built by Alibaba Research for exactly this use case. It uses a non-autoregressive CTC architecture that processes audio at roughly one-tenth real-time — RTF of 0.10, which means a ten-second clip takes about one second to transcribe. That's very fast.

It auto-detects across its five supported languages, so mixed-language dictation between, say, Mandarin and English works without switching modes. At 226 MB, the download is light for what it covers.

If you regularly switch between English and any East Asian language, SenseVoice is the right choice. For Mandarin-primary users who need the highest possible accuracy on Chinese, consider FireRedASR Large instead (below).

For Mandarin accuracy: FireRedASR Large

Best for: Mandarin-primary speakers who prioritize accuracy above all else.
Size: ~1.7 GB — Languages: Mandarin Chinese + English

FireRedASR Large is the best locally-runnable Mandarin model available. Built by the FireRed team on an attention encoder-decoder architecture, it achieves 3.18% character error rate on Mandarin benchmarks — state of the art for an offline model. It also handles Chinese dialects and code-switching between Mandarin and English.

The download is larger at 1.7 GB, and it runs somewhat slower than SenseVoice. But if your primary use is professional Mandarin dictation — documents, correspondence, clinical notes — the accuracy difference is worth it.

For Japanese: Zipformer Japanese

Best for: Japanese-only speakers who want the highest accuracy.
Size: ~148 MB — Languages: Japanese only

This model was trained on 35,000 hours of ReazonSpeech v2.0 data — one of the largest Japanese speech corpora publicly available. The Icefall Zipformer architecture runs at RTF 0.08, which means real-time-or-better transcription on any modern Mac.

SenseVoice covers Japanese, but if Japanese is your primary or only language, this dedicated model will generally outperform it. 148 MB is a compact download for what it delivers.

For Korean: Zipformer Korean

Best for: Korean speakers.
Size: ~68 MB — Languages: Korean only

At 68 MB, Zipformer Korean is one of the smallest local STT models around. It runs at 29x faster than real-time (RTF 0.034) — among the fastest in this lineup. A minute of speech processes in about two seconds on Apple Silicon.

If you dictate in Korean, this is the right choice. The download is negligible and the speed is unmatched.

For Russian: GigaAM v2 Russian

Best for: Russian-primary speakers.
Size: ~231 MB — Languages: Russian only

GigaAM v2 comes from SaluteSpeech, Sber's speech AI research team. It's commercially licensed and uses a NeMo transducer architecture trained specifically on Russian speech. It's the best locally-runnable Russian ASR model available.

If Russian is your primary dictation language, this model will significantly outperform Whisper Turbo on Russian content. 231 MB is compact for the coverage it provides.

For everything else: omniASR 300M

Best for: Any language not covered above.
Size: ~348 MB — Languages: 1,600+ languages

Meta's omniASR is a CTC model trained across over 1,600 languages. If you speak a language that isn't served by any of the dedicated models above and isn't in Whisper's 99-language set, omniASR is your option.

It covers many low-resource languages that no other local model does. Accuracy on well-represented languages is good; on lower-resource ones it varies, as it does with any model trained on limited data. But for languages with no other local option, it's a meaningful baseline.

Quick reference

ModelSizeBest for
Parakeet TDT 0.6B v3640 MBEnglish + 24 European languages. Start here.
Moonshine v2 Medium200 MBEnglish only. Lightweight alternative to Parakeet.
Whisper Large V3 Turbo1 GB99 languages. Use for languages not covered above.
Whisper Distil Large v3.51 GBEnglish only. Faster than Turbo, same accuracy.
SenseVoice Small226 MBChinese, Japanese, Korean, Cantonese, English.
FireRedASR Large1.7 GBMandarin-primary. Best Chinese accuracy.
Zipformer Japanese148 MBJapanese-only. Trained on 35k hours.
Zipformer Korean68 MBKorean-only. Smallest and fastest model.
GigaAM v2 Russian231 MBRussian-only. Best Russian accuracy.
omniASR 300M348 MB1,600+ languages. Universal fallback.

What Resonant uses

Resonant ships with Parakeet on Apple Silicon — chosen for its accuracy on English and 24 European languages, low latency, and Neural Engine fit. The other models in this guide are useful context if you're evaluating local STT broadly, or running your own pipeline.

All transcription in Resonant runs on your Mac. No account required. No audio sent anywhere.

Download Resonant.

Share