Transcription Accuracy Benchmark 2026
We are curating a public dataset of 100 audio samples across English and LATAM Spanish dialects to measure word error rate (WER) on the major transcription tools. Methodology below; results and dataset publish Q2 2026.
Methodology
We plan to evaluate each transcription tool on an open dataset of 100 audio samples, 10 per dialect. Each sample will be 60-120 seconds of natural speech (interviews, podcasts, meetings) with ground-truth transcription produced by a native-speaker human transcriber with inter-annotator agreement above 0.90.
Dialects covered
- US English (enUs) — General American, mixed gender and age
- UK English (enUk) — RP and regional British
- Mexican Spanish (esMx) — Chilango and northern Mexican
- Argentine Spanish (esAr) — Buenos Aires porteño with voseo
- Chilean Spanish (esCl) — Santiago, fast speech rate
- Colombian Spanish (esCo) — Bogotá and coastal variants
Scoring
For each sample we compute Word Error Rate as the sum of substitutions, deletions, and insertions divided by the reference word count, using the jiwer Python library. We normalize punctuation and casing before scoring. Final per-dialect WER is the mean across the 10 samples in that dialect.
Tools under evaluation
We plan to evaluate the major speech-to-text APIs available in Q2 2026, covering both cloud APIs and open-source baselines. Specific tool list and version numbers will be disclosed in the published report to keep the methodology auditable.
Reproducibility commitment
On publication, the dataset, ground-truth transcripts, and evaluation scripts will be released on GitHub under CC BY 4.0. Anyone will be able to re-run the benchmark end to end with a single command and verify the numbers.
Test Transcapt on your own audio
15 free minutes. No credit card. Evaluate accuracy on your recordings while we finish the public benchmark.
Start free