Speech Recognition for Media: Rethinking Accuracy
Adapted from our post Speech Recognition for Media (PBS Idea Lab)
“How accurate are your automatic transcripts?" It’s one of the most frequently asked questions at Pop Up Archive — and one of the hardest to answer. It’s a fair question, yet it often anticipates an unfair answer: 100% accurate. Media producers want the ease and speed of automatic transcripts and captions, but are often loathe to publish anything short of this mystical percentage.
The barrier to perfect accuracy: If this is what the people want, why don’t we give it to them? The fact is, machine transcription for media voices is a tricky business: you have to factor in background noise, overlapping speech, and poor audio quality. There’s no way to guarantee accuracy for automatic transcription for audio of ranging quality and content.
We’d like to pose our own question: do you really need 100% accuracy? To value automatic transcripts only at 100% accuracy is to misunderstand the way the Internet reads text. After all, search engines don’t need perfect transcripts. Neither do producers looking for particular moments in hours of interviews. Harnessed the right way, speech-to-text software means effortless drag-and-drop access to crucial keywords and moments hidden deep within hours of content.
Toward more searchable transcripts: That said, more accurate text still means more accurate search. Pop Up Archive is accomplishing this through speech-to-text that we target at specific genres of media — for example, news broadcasts, first-person interviews, and archival audio from different decades.
Intrigued? Get a free sample transcript for a short audio file from our new and improved speech-to-text software.
***Email us at firstname.lastname@example.org to test the new software with your own audio.***