WER expectations by environment

Word-error-rate is dominated by the audio channel, not by the model. Same Nordic Whisper-finetune produces 2-3× different WER depending on where the audio comes from. This page sets honest expectations per deployment scenario.

Last updated: 2026-04-24

The four tiers

Tier 1 — Lab / clean read-speech
~6–10% WER
FLEURS, NPSC clean read-speech, 16kHz mono. Used for academic benchmarks. Not representative of real call audio.
Tier 2 — Genesys AudioHook (production target)
~12–18% WER
Wide-band 16kHz audio direct from Genesys Cloud over WSS. PCMU codec but no PSTN-internasjonal degradation. Stereo channel-split for caller/agent. This is what your customers will see.
Tier 3 — PSTN-international demo (Twilio / 46elks)
~18–28% WER
8kHz μ-law over international PSTN. Multiple codec hops, audio compression, network jitter. This is what the public demo at demo.muninlabs.io uses. Worst-case representative.
Tier 4 — Background-noise call-center realism
~22–35% WER
Real production audio with keyboard noise, multiple speakers, hold-music bleed, accent variations. Improves with per-customer LoRA finetune (Q3 2026 premium feature).

Real benchmark on our YouTube IS-clips

We tested the deployed pipeline on 7 real Icelandic clips from RÚV (news, interviews, podcasts) with manually verified ground truth, in both ideal (16kHz PCM) and telephony-simulated (8kHz μ-law) modes:

Clip Type Ideal WER Telephony WER
clip01_newsNews read-speech14.5%17.1%
clip02_newsNews read-speech17.3%28.8%
clip04_interviewInterview, conversational24.6%39.1%
clip05_podcastPodcast, two speakers16.4%18.2%
clip06_podcastPodcast, clear single speaker6.0%12.0%
clip07_challengingMarked "challenging" — heavy accent + bg noise33.8%28.7%
Mean (excluding gt-mismatch outlier) ~18.8% ~24.0%

Why the public demo will look worse than your production deployment

The Carl-test demo at demo.muninlabs.io uses Twilio (US) or 46elks (Sweden) PSTN bridges to receive your call. Both introduce ~5–13 percentage points of additional WER vs Tier 2 (Genesys AudioHook). What you see there is the worst case for our stack.

Production deployments via Genesys AudioHook receive 16kHz wide-band audio directly, with channel-split for caller and agent. Same models, dramatically better quality.

Methodology