Pop Up Archive and WGBH embark on a landmark project to make the American Archive searchable
On August 31, the Institute of Museum and Library Services (IMLS) awarded $14.16 million in grant funding to libraries across the United States. We’re thrilled to announce that the WGBH Educational Foundation, together with the American Archive of Public Broadcasting and Pop Up Archive, received one of 276 National Leadership Grants.
The $898,474 grant includes transcribing, analyzing, and building crowdsourcing tools for almost 40,000 hours of digital audio from the American Archive of Public Broadcasting over the next two and half years. This will be the first major media archive of its kind: the new American Archive site will integrate full-text, searchable transcripts and crowdsourced metadata for thousands of hours of audiovisual materials.
The IMLS grant follows a 2013 Corporation for Public Broadcasting (CPB) grant that is currently underway: that two-year project announced WGBH and the Library of Congress as the permanent stewards of the American Archive, responsible for digitizing nearly 40,000 hours of media. The new project in partnership with Pop Up Archive is a natural next step for the American Archive, answering the question of how to ensure accessibility for the digitized audiovisual media going forward.
Metadata creation for media at scale benefits from both machine analysis and human correction. Pop Up Archive and WGBH are combining forces to do just that. Innovative features of the project include:
- Speech-to-text and audio analysis tools to transcribe and analyze almost 40,000 hours of digital audio from the American Archive of Public Broadcasting
- Open source web-based tools to improve transcripts and descriptive data by engaging the public in a crowdsourced, participatory cataloging project
- Creating and distributing data sets to provide a public database of audiovisual metadata for use by other projects.
In addition to Pop Up Archive’s machine transcripts and automatic entity extraction (tagging), we’ll be conducting research in partnership with the HiPSTAS center at University of Texas at Austin to identify characteristics in audio beyond the words themselves. That could include emotional reactions like laughter and crying, speaker identities, and transitions between moods or segments.
We’re grateful to have the generous support of the IMLS, and we can’t wait to make these recordings — and many historic voices — discoverable. The full digital archive promises to be an incredible resource for generations of audiences and researchers. You can explore the foundation of the American Archive site now, and stay tuned over the coming months as the archive grows!