Word Error Rate Benchmark
Measure real WER on your data. Upload your audio and reference transcript to benchmark STT providers quantitatively. Understand which models deliver the best results for your content.
WER compares provider outputs to your ground-truth transcript, revealing insertions, deletions, and substitutions. Perfect for evaluating transcription quality beyond marketing claims.
What is Word Error Rate?
WER = (Insertions + Deletions + Substitutions) / Total Words in Reference. Lower WER means higher accuracy. It's the industry standard for measuring speech recognition performance.
Upload Audio & Reference Transcript
Provide both your audio file and the correct transcript to enable WER calculations and detailed accuracy metrics.
Don't have a reference transcript? Try simple provider comparison →
Input Source
Click or drag audio file here
Supports MP3, M4A, WAV, OGG
Max file size: 100MB
Click to upload reference transcript
TXT or MD files
Configuration
Advanced Options
Provides a hint for the minimum and maximum number of expected speakers to improve diarization accuracy.
Boosts the recognition probability of specific words or phrases, such as proper nouns or domain-specific terms. Provide one phrase per line.
Note: Files and transcripts are not stored on our servers and are used only to complete your request. More features are coming.
Understanding WER Results
WER Score
A percentage showing transcription errors. 0% = perfect accuracy. Typical ranges: 5-15% for good models.
Error Breakdown
See insertions (extra words), deletions (missing words), and substitutions (wrong words) to understand error types.
Tips for Better Results
- Use clear, high-quality audio recordings
- Provide accurate reference transcripts with proper punctuation
- Test with your specific use case (accent, domain terminology)
- Compare multiple providers to find the best fit