Analyzing podcasts through their language

Using transcript analysis to fuel podcast discovery

Lots of people get podcast recommendations from friends, social media, newsletters, or websites — which are pretty much the only sources available to them. Today’s podcast listening apps don’t offer much in terms of options for search and discovery.

A few apps have charts that reflect measurements of show popularity, or offer some insight into the listening habits of people who subscribe to the same show you do. But most podcast platforms simply don’t have much information about the podcasts they offer to listeners. There’s a title, maybe a short description, and possibly some listening data gathered by tracking users in the app. But the episodes themselves — their content, the topics they cover, people they feature, moods they strike — are black boxes.

Sure, popularity is a powerful measure of quality, and titles and descriptions are certainly related to podcast content — but when it comes to serendipitous discovery, there is no podcast equivalent of Spotify’s Discover Weekly or Netflix’s hyper-specific categories.

The iTunes top podcasts in June 2016

Popularity measurements, and listener data more generally, can offer insight into older episodes, or shows and networks that are already popular, but there’s a “cold start” problem for new podcasts. Today’s podcast charts are dominated by big names from public media that have established listener bases. Better information about podcast episodes could help new shows build audiences — and help listeners discover new podcasts relevant to their interests.

At, we’re analyzing episode transcripts in concert with listener data and other information about podcasts. An hour long episode could have a transcript with 10,000 words — and when we analyze that transcript for proper names, word frequency, and thematic clusters, we get much more nuanced information about the podcast than you might get from a two-sentence podcast description dashed off by a producer an hour before the episode went live.

Using transcripts as a measure of podcast similarity can help matchmake the right podcasts to the right listeners at the right time, activating the long-tail of content for podcasts that aren’t necessarily the most popular.

One example from testing our new podcast recommendation algorithms: an episode of the popular Buzzfeed podcast Another Round mentioned the writer Jamilah Lemieux, which was captured in our transcript of the episode. Based on this, along with other metadata from the episode, one of the recommendations our algorithm identified was The Jamilah Lemieux Episode from Loudspeaker Network’s The Combat Jack Show. This is the kind of nuanced recommendation that’s possible with transcript data.

Coders & Entrepreneurs! Are you working on a podcast app? Contact us to test’s new podcast recommendation offerings.

Get early access to our latest podcast recommendation API endpoints