
When tested on 73 languages from YouTube captions, USM achieved an impressive word error rate (WER) of less than 30%, meaning it understands languages better than ever. Speech-to-Text v2 modernizes our API interface and introduces several new features.

Our industry-leading, speech-to-text algorithms will convert audio & video files to text in.
GOOGLE TRANSLATE SPEECH TO TEXT API SOFTWARE
This makes USM efficient and adaptable to new languages and data. Sonix is the best audio and video transcription software online. USM is perfect for use on YouTube, making it possible for people worldwide to enjoy closed captions in their own language.īut how does it work with so many languages, especially those with fewer speakers? The secret lies in using a huge dataset of different languages and fine-tuning it on smaller, labeled data. Say hello to the Universal Speech Model (USM), a cutting-edge language tool that understands and translates speech in over 300 languages! Created using a massive 2 billion parameters and trained on 12 million hours of speech, USM is here to help you understand everything from popular languages like English and Mandarin to lesser-known ones like Balinese, Shona, and Xhosa. 👩💻 Recommended: OpenAI’s Speech-to-Text API: A Comprehensive Guideīut it didn’t take long for Google to catch up: 🚀 👇 At the time, it has just beaten Google’s best speech recognition API out there: Recently, we wrote about OpenAI’s groundbreaking speech recognition tool Whisper.
