On-Device Meeting Transcription on Apple Silicon
Apple Silicon's Neural Engine makes real-time meeting transcription possible without the cloud. Here's how it works, what models run, and why the accuracy matches cloud services.
TL;DR
The Neural Engine in M-series Macs runs speech recognition models at real-time speed with minimal power draw. Resonantcompiles Parakeet to CoreML and executes it on this dedicated hardware — achieving cloud-level accuracy for meeting transcription without any network dependency.
Why Apple Silicon changes the equation
Before Apple Silicon, running speech recognition locally meant GPU-heavy inference that drained batteries and produced noisy fans. Cloud was the practical option because consumer hardware wasn't fast enough for real-time transcription at acceptable quality.
Apple Silicon changed this. Every M-series chip includes a dedicated Neural Engine — hardware accelerator specifically designed for machine learning inference. It runs neural networks at high throughput with minimal power consumption, separate from the CPU and GPU.
The technical pipeline
- Audio capture. System audio (other participants) and microphone (your voice) are captured simultaneously.
- Preprocessing. Audio is converted to mel spectrograms — the input format speech models expect.
- Neural Engine inference. CoreML-compiled Parakeet TDT processes the spectrograms on the Neural Engine.
- Decoding. Model output is decoded into text with timestamps and optional speaker labels.
- Post-processing. Punctuation, capitalization, and filler removal are applied.
This entire pipeline runs in real-time on a MacBook Air. No fan noise, minimal battery impact, no network.
Models available
- Parakeet TDT 0.6B. NVIDIA's state-of-the-art on-device English STT. Highest accuracy for English meetings. Fast inference, small model size, runs on the Neural Engine.
Performance by chip
- M1 / M1 Air. Comfortably real-time transcription with Parakeet.
- M2 / M3. Comfortably real-time with all models. Room for concurrent workloads.
- M3 Pro/Max, M4. Well under real-time. Can process longer audio segments in bursts for even lower latency.
Accuracy comparison
On standard English meeting benchmarks, Parakeet TDT running locally on Apple Silicon achieves word error rates (WER) comparable to cloud APIs from Google, AWS, and AssemblyAI. The gap has closed because:
- Modern speech models are surprisingly compact — 600M parameters runs efficiently on-device.
- The Neural Engine provides enough TOPS (trillions of operations per second) for real-time inference.
- CoreML optimization and quantization reduce model size without meaningful accuracy loss.
Related resources
- Offline meeting transcription— the offline workflow this enables.
- Automatic meeting notetaker for Mac— the full product overview.
- Local STT models in 2026— deeper dive on available models.
- Parakeet TDT on Mac— the English-optimized model.
Frequently asked questions
Can Apple Silicon run meeting transcription locally?
Yes. The Neural Engine in all M-series chips provides real-time speech recognition without cloud dependency.
Is on-device accuracy as good as cloud?
Yes. Parakeet TDT on Apple Silicon matches or exceeds many cloud APIs for English.
Which Mac do I need?
Any Apple Silicon Mac: M1 or later (including Air, Pro, Max, Ultra variants). Intel Macs are not supported.
Does transcription drain battery?
The Neural Engine is power-efficient by design. A 60-minute meeting transcription adds minimal battery impact — comparable to light background processing.
What Resonant offers beyond dictation
Resonant isn't just a faster way to type. It's a voice workspace with capabilities no other dictation tool provides.
MCP server for AI tools
Resonant exposes 11 MCP tools that let any AI agent — Claude, Codex, and more — query your entire voice workspace — meetings, dictations, memos, ambient context, and daily journal. Your AI assistant knows what you said this morning. Learn more
Meeting transcription with speaker labels
Dual-channel recording — your mic and system audio on separate channels. NVIDIA Sortformer diarization identifies who said what. No bot joins the call. No audio leaves your Mac. Learn more
Ambient context capture
Passively records which apps you use, window titles, URLs, and dwell time — all locally. This makes dictation context-aware and gives your AI tools a queryable work timeline. Learn more
Two on-device speech models
NVIDIA Parakeet TDT v3 (0.6B, 25 languages) and Qwen3 ASR (0.6B, 30+ languages), both compiled to CoreML and running on Apple Neural Engine. Under 4% WER on English benchmarks. Learn more
Cloud cleanup with hallucination detection
Optional AI post-processing fixes STT errors and adapts to context (email, message, code). Guardrails detect when the LLM rewrites your meaning instead of cleaning your grammar. Learn more
Start with private Mac dictation
Local speech recognition is free and runs on your Mac. Pro adds cloud cleanup, rewrites, summaries, and sharing when you want the full workflow.