Guided by Digital Voices

5 unexpected insights from automatic speech recognition


Pop Up Archive has been hard at work implementing new speech recognition software for our partners at organizations like NPR, StoryCorps, and the Hoover Institution. The result of this work means better auto-transcripts, and better auto-transcripts mean better access into hours upon hours of spoken content locked in digital audio.

Along the way, we’ve learned some surprising things about the state of automatic speech recognition. Here’s our crash course in the workings of speech-to-text software:

1. Speech-to-text software learns language like people do.

All automatic speech recognition software learns from whatever data it’s given. So, like a person, the more “well-read” your software is in a particular area, the more it will understand.

2. The human standard for perfect transcription is being questioned.

The gold standard for transcripts has always been human transcription. But as machine learning gets better, a human transcriber won’t necessarily transcribe more accurately than a computer for unfamiliar dialects. Speech-to-text software is trained on many voices, so it can interpret dialects from all over the world. Check out this 2011 Google Tech Talk on “Superhuman speech recognition.

3. Speaking clearly can make you harder to understand.

Since most speech software is trained on naturalistic pronunciations — that is, how you would say a word in a real conversation — speakers that over-articulate may not be properly understood. For example, to clearly pronounce the “t"s in "butter” would go against the Standard American English pronunciation, which is closer to a “d” sound.

4. Not all vocabularies are created equal.

When you create a language model, it’s not just the number of words in the model that contributes to accuracy – it’s how well their distribution matches those of the content. 

5. We’ve only scratched the surface. 

Speaker recognition. Accurate punctuation. Comprehensive geographical and biographical knowledge….

All of these features are not only possible in automatic speech recognition, but will soon be on their way into your own Pop Up Archive auto-transcripts. As we integrate the new software into Pop Up Archive over the next few months, you’ll see  major improvements to our automatic transcription and editing tools. We’ll keep you posted as our new features become available!