Main / Blog

February 8, 2026

WER Normalization with Nvidia NeMo is Live

Professional-grade Word Error Rate testing for STT/ASR models. High-quality multilingual normalization powered by Nvidia NeMo.

What's new

We've implemented comprehensive WER (Word Error Rate) calculation with three normalization levels, including integration with Nvidia NeMo's industry-standard text normalization service.

If you need to test STT or ASR models with high-quality multilingual WER metrics, you can now use:

  • /wer — Full WER testing with multiple normalization modes (raw, normalized, strict)
  • /normalize — Test our Nvidia NeMo-based normalization service directly

Three normalization levels

Raw mode

Character-level exact comparison. No normalization, preserving case, punctuation, and spacing. Best for debugging exact transcription output.

Normalized mode

Case-insensitive comparison with intelligent punctuation handling. Powered by Nvidia NeMo for consistent multilingual tokenization. Ideal for real-world accuracy testing.

Strict mode

Aggressive normalization using Nvidia NeMo's Inverse Text Normalization (ITN). Converts number words to digits ("twenty three" → "23"), removes all punctuation, normalizes whitespace. Perfect for comparing pure semantic accuracy across providers.

Nvidia NeMo integration

We're using Nvidia NeMo's Text Normalization and Inverse Text Normalization (ITN) for professional-grade multilingual support. NeMo handles complex cases like:

  • Number word normalization ("twenty three" → "23")
  • Date and time formats across languages
  • Currency and measurement units
  • Proper tokenization for 100+ languages

The system gracefully falls back to local normalization if the NeMo service is unavailable, ensuring your WER calculations always complete.

Why this matters

Word Error Rate is the industry-standard metric for evaluating speech recognition systems. But naive WER calculations can be misleading — they penalize semantically correct transcriptions that differ only in formatting.

For example, "twenty three" vs "23" is semantically identical but produces errors in basic WER. Our strict normalization mode with NeMo ITN solves this, giving you accurate evaluation of semantic accuracy across different STT providers.

Try it now

No login required. Just paste your reference text and hypothesis transcription at /wer to get instant WER metrics with all three normalization levels.

Want to test normalization separately? Visit /normalize to see how your text transforms through our Nvidia NeMo pipeline.

Coming soon

  • Batch WER calculation for testing multiple audio files
  • Language-specific normalization optimizations
  • WER visualization with aligned transcriptions
  • Export WER reports for documentation

🚀