// AI DATA CAPABILITIES / INVERSE TEXT NORMALIZATION
Turning spoken language into structured text.
Inverse Text Normalization — the critical post-processing layer that makes your speech AI output production-ready.
When a speech recognition system transcribes "three hundred and forty-five dollars and fifty cents", your downstream system needs to read "$345.50". That transformation from spoken form to written form is Inverse Text Normalization (ITN), and it is one of the most technically demanding, language-specific challenges in speech AI. Nextura.ai delivers ITN as both a human annotation service and a rule-based / ML-assisted pipeline — across 30+ global languages.
// WHAT.ITN.COVERS
ITN transformation categories
| Numbers & Numerals | "four hundred and twenty" → "420"; "three point five" → "3.5" |
| Currency | "twenty five dollars and ninety nine cents" → "$25.99" |
| Dates | "the fourteenth of march twenty twenty six" → "14/03/2026" |
| Times | "three forty five in the afternoon" → "3:45 PM" |
| Phone Numbers | "zero nine one four five six seven eight nine" → "09145-6789" |
| Addresses | "twelve baker street london" → "12 Baker Street, London" |
| Abbreviations | "mister sharma" → "Mr. Sharma"; "doctor" → "Dr." |
| Measurements & Units | "thirty five kilometers per hour" → "35 km/h" |
| Percentages | "forty five point three percent" → "45.3%" |
| Domain-Specific Entities | Medical, legal, financial, and technical terminology normalization |
// LANGUAGES
Multilingual ITN across 30+ foreign languages.
ITN is deeply language-specific — the rules for English are entirely different from German, Mandarin, or Tamil. Nextura.ai's service is built natively for each language.
- Language-specific numeral systems (Arabic numerals, Devanagari, CJK characters, etc.)
- Cultural date and time formats (DD/MM/YYYY vs MM/DD/YYYY vs YYYY/MM/DD)
- Currency symbol conventions and regional formatting rules
- Transliteration handling for romanized Indic and CJK language transcriptions
- Code-switching normalization for bilingual or multilingual speech
English (all variants)FrenchGermanSpanishMandarinJapaneseKoreanPortugueseArabicHindiTamilTeluguBengaliMarathi+ 35 more
// DELIVERY.APPROACH
Human-in-the-loop + pipeline automation.
- Rule-based ITN pipeline: deterministic normalization for high-frequency patterns
- ML-assisted ITN: context-aware normalization for ambiguous or domain-specific cases
- Human annotation review: expert review layer for complex or low-resource language ITN
- Custom glossary and entity dictionary integration for brand, product, and domain terms
- Integration-ready output: formatted for downstream ASR post-processing, TTS systems, NLU pipelines