What audio formats can be transcribed on Mac?

Whisper and Resonant support MP3, M4A, WAV, FLAC, OGG, and most common audio formats. Video files (MP4, MOV) can also be transcribed — the audio track is extracted automatically.

How long does transcription take on Mac?

On Apple Silicon (M1 or later), most tools transcribe at 10–100x real-time speed. A 1-hour recording takes roughly 1–6 minutes depending on the model size and your chip.

Does audio transcription require internet on Mac?

Not with Whisper or Resonant. Both run speech models locally on your Mac. Cloud services like Otter.ai require internet, but the free local options don't.

Back to resources

GuideMay 3, 2026

Transcribe Audio to Text on Mac
3 Free Ways (2026)

You have a recording — a meeting, interview, lecture, or voice memo — and you need it as text. macOS doesn't have a built-in file transcription tool, but there are three free options that run locally on your Mac.

Option 1: macOS Dictation (workaround)

macOS Dictation is designed for live speech, not file transcription. The workaround — playing audio through speakers and running dictation simultaneously — is unreliable and inaccurate.

Why it doesn't work well

30–60 second timeout — you'd need to restart it constantly through a long recording.
Picks up ambient noise — transcribing from speakers means room noise degrades accuracy.
No file input — there is no way to feed an audio file directly to Apple Dictation.
No timestamps or speaker labels — just raw text.

Option 2: Whisper via command line

OpenAI's Whisper is an open-source speech recognition model. You can run it locally on your Mac for free.

Setup

Install Python 3.8+ (via Homebrew: brew install python).
Install Whisper: pip install openai-whisper
Install ffmpeg: brew install ffmpeg
Transcribe: whisper audio.mp3 --model medium

What it's good at

High accuracy, especially with medium or large models. Supports 100+ languages. Outputs TXT, SRT, and VTT. Fully local — no data sent anywhere.

Where it falls short

Technical setup — requires Python, pip, and command line comfort.
Slower without optimization — the vanilla Python Whisper doesn't leverage Apple Neural Engine. Transcription can be slow without CoreML compilation.
No GUI — command line only. No drag-and-drop.
No speaker labels out of the box (needs additional tools for diarization).

Option 3: Resonant

Resonant combines modern on-device transcription accuracy with the convenience of a native Mac app. Drag and drop an audio file, get a transcript.

What it's good at

Drag and drop — no command line, no Python, no setup.
Fast — speech models compiled to CoreML for Apple Neural Engine. 50–150x realtime on M-series chips.
Parakeet on Apple Silicon — NVIDIA's state-of-the-art STT model running on the Neural Engine.
Speaker labels — identifies who said what.
Timestamps — word and segment-level timing.
Export — TXT, Markdown, SRT, VTT.
Fully offline — your audio never leaves your Mac.

Setup

Download Resonant and open it.
Drag an audio or video file onto the app.
Choose your model and language.
The transcript appears in seconds to minutes depending on file length.

Side-by-side comparison

Here's how the three options compare for audio file transcription.

Feature	macOS Dictation	Whisper CLI	Resonant
Transcribes audio files	No — live speech only	Yes — any audio/video file	Yes — drag and drop
Setup difficulty	None (built in)	High (Python, pip, command line)	None (download and run)
Accuracy	Low for playback transcription	High (Whisper medium/large)	High (Parakeet)
Speed on Apple Silicon	N/A	~10–30x realtime (depends on setup)	~50–150x realtime (CoreML optimized)
Supported formats	N/A	MP3, WAV, M4A, FLAC, MP4, etc.	MP3, WAV, M4A, FLAC, MP4, MOV, etc.
Speaker labels	No	Basic (with extensions)	Yes
Timestamps	No	Yes (SRT/VTT output)	Yes
Export formats	N/A	TXT, SRT, VTT, JSON	TXT, Markdown, SRT, VTT
Internet required	Partially	No — fully local	No — fully local
Cost	Free	Free (open source)	Free

Which one should you use?

If you're comfortable with the command line: Whisper is excellent. Free, accurate, and local. Just know the setup takes some effort.

If you want the same accuracy without the setup: Resonant. Drag a file, get a transcript. CoreML-optimized for speed on Apple Silicon.

Don't use macOS Dictation for file transcription. It's not designed for it and the results are poor.

Ready to transcribe recordings locally?

Download Resonant for Mac

Free · macOS 14+ · Apple Silicon

Frequently asked questions

Can macOS transcribe audio files for free?

Not directly. macOS Dictation is for live speech. For file transcription, use Whisper (command line) or Resonant (drag and drop).

Is Whisper transcription free?

Yes. The Whisper model is open-source and runs locally. No subscription, no API cost.

What audio formats can be transcribed?

MP3, M4A, WAV, FLAC, OGG, MP4, MOV — most common audio and video formats work with both Whisper and Resonant.

How long does transcription take?

On Apple Silicon, 10–150x realtime. A 1-hour recording takes roughly 30 seconds to 6 minutes depending on model and chip.

Does transcription require internet?

Not with Whisper or Resonant. Both run speech models locally. Cloud services (Otter.ai, Rev) require internet.

What Resonant offers beyond dictation

Resonant isn't just a faster way to type. It's a voice workspace with capabilities no other dictation tool provides.

MCP server for AI tools

Resonant exposes 11 MCP tools that let any AI agent — Claude, Codex, and more — query your entire voice workspace — meetings, dictations, memos, ambient context, and daily journal. Your AI assistant knows what you said this morning. Learn more

Meeting transcription with speaker labels

Dual-channel recording — your mic and system audio on separate channels. NVIDIA Sortformer diarization identifies who said what. No bot joins the call. No audio leaves your Mac. Learn more

Ambient context capture

Passively records which apps you use, window titles, URLs, and dwell time — all locally. This makes dictation context-aware and gives your AI tools a queryable work timeline. Learn more

Two on-device speech models

NVIDIA Parakeet TDT v3 (0.6B, 25 languages) and Qwen3 ASR (0.6B, 30+ languages), both compiled to CoreML and running on Apple Neural Engine. Under 4% WER on English benchmarks. Learn more

Cloud cleanup with hallucination detection

Optional AI post-processing fixes STT errors and adapts to context (email, message, code). Guardrails detect when the LLM rewrites your meaning instead of cleaning your grammar. Learn more

Start with private Mac dictation

Local speech recognition is free and runs on your Mac. Pro adds cloud cleanup, rewrites, summaries, and sharing when you want the full workflow.

Download View pricing

More resources

Private AI Dictation for Australian Professionals Best AI Note Takers for Mac Fireflies Alternative Fathom Alternative

Transcribe Audio to Text on Mac3 Free Ways (2026)

Option 1: macOS Dictation (workaround)

Why it doesn't work well

Option 2: Whisper via command line

Setup

What it's good at

Where it falls short

Option 3: Resonant

What it's good at

Setup

Side-by-side comparison

Which one should you use?

Frequently asked questions

Can macOS transcribe audio files for free?

Is Whisper transcription free?

What audio formats can be transcribed?

How long does transcription take?

Does transcription require internet?

What Resonant offers beyond dictation

MCP server for AI tools

Meeting transcription with speaker labels

Ambient context capture

Two on-device speech models

Cloud cleanup with hallucination detection

Start with private Mac dictation

More resources

Transcribe Audio to Text on Mac
3 Free Ways (2026)