
ElevenLabs is shaking things up with their latest creation, Scribe—a cutting-edge speech-to-text model that’s setting new standards in the industry. Y
ou might know ElevenLabs for their impressive audio generation tech, but now they’re diving headfirst into speech detection. It’s like they’re saying, “Watch out, Gladia and OpenAI’s Whisper, we’re coming for you!”
With a hefty $180 million funding boost, ElevenLabs is now valued at a staggering $3.3 billion. Talk about making waves! Their new pride and joy, Scribe, supports a whopping 99 languages.
But here’s the kicker, it boasts top-notch accuracy in more than 25 languages, like English and Spanish, where the word error rate is under 5%. That’s pretty impressive, right?
CEO Mati Staniszewski shared some insights about their vision: “We want to understand what’s being said in conversations better. We’re moving beyond just generating content to truly understanding and transcribing speech.”
And let’s be honest, tackling the challenges in speech-to-text tech, especially for those lesser-known languages, is no small feat.
Scribe isn’t just about translating words. It’s packed with cool features like smart speaker diarization and word-level timestamps for spot-on subtitles. Plus, it can auto-tag sound events like laughter—how neat is that?
Initially, it’s geared for pre-recorded audio, but they’ve got plans for a low-latency version for real-time use. So, if you’re thinking about how this might fit into your workflow, stay tuned!
Now, let’s talk numbers. Scribe is priced at $0.40 per hour of transcription. Sure, it’s competitive, but with rivals offering similar services at lower prices, it’s going to be an interesting race!