ModelsUpdated Apr 28, 2026

SenseVoice for East Asian Dictation on Mac: A 2026 Guide

SenseVoice is one of the most popular open-weight speech models for East Asian languages. If you're researching offline dictation across Mandarin, Japanese, Korean, or Cantonese on Mac, SenseVoice is going to come up — here's what it is, where it shines, and what we ship in Resonant for East Asian dictation instead.

What SenseVoice is

SenseVoice is an open-weight ASR model from Alibaba's FunASR project. The Small variant is around 226 MB and supports five languages out of the box: Mandarin Chinese, English, Japanese, Korean, and Cantonese. It uses a non-autoregressive architecture, which means it decodes very fast — real-time factor (RTF) around 0.10 on modest hardware, well below 1.0.

Beyond raw transcription, SenseVoice also predicts emotion and audio events (laughter, applause, music, etc.), which is useful for some downstream tasks but not generally for live dictation.

Where SenseVoice is the right choice

For batch transcription of East Asian audio — meetings, podcasts, video captions across Mandarin, Japanese, Korean, and Cantonese — SenseVoice Small is hard to beat for its size. The five-language coverage is a real strength, and the speed makes long-audio jobs cheap.

Why Resonant ships Qwen3 ASR

Resonant ships Alibaba's newer Qwen3 ASR 0.6B as its multilingual model. SenseVoice and Qwen3 ASR come from the same broader ecosystem, but Qwen3 ASR is the more recent flagship: 30+ languages instead of five, and code-switching as a first-class case — speaking Mandarin and English in the same sentence, or Japanese and English, without the model freezing on one or the other.

For real-time dictation, the breadth and code-switching fluency matter more than emotion detection. Qwen3 ASR is also small enough to compile to CoreML and run on Apple Neural Engine with sub-second latency, which lets us treat speech-to-text as a system-wide input method rather than a transcription job.

The short version

SenseVoice Small is excellent for batch East Asian transcription with a tight, fast model. For live dictation across East Asian languages and English — Resonant ships Qwen3 ASR on Apple Neural Engine, fully on-device, no cloud.

Download Resonant to try Mandarin, Japanese, Korean, or Cantonese dictation on your Mac.