Podcasting has become a popular and powerful medium for storytelling, news, and entertainment. However, without transcripts, podcasts may not be accessible to people who are hard-of-hearing, deaf, or deaf-blind. Ensuring that auto-generated podcast transcripts are both readable and accurate can be a challenge. The Apple Podcasts catalog includes millions of podcast episodes that are transcribed using automatic speech recognition (ASR) models. To assess the quality of these transcripts, we compare a small number of human-generated transcripts to the corresponding ASR transcripts.
The traditional word error rate (WER) metric used to measure transcript accuracy lacks nuance as it penalizes all errors equally, regardless of their impact on readability. To address this, we developed the human evaluation word error rate (HEWER) metric, which focuses on major errors that significantly affect readability, such as misspelled proper nouns and punctuation errors.
In our study of American English podcast segments, we found that the ASR transcripts had an average WER of 9.2%, but a HEWER of just 1.4%. This indicates that the ASR transcripts were of higher quality and more readable than the WER suggests. By focusing on major errors, HEWER provides a more nuanced assessment of transcript readability.
Our research aimed to improve the accessibility of Apple Podcasts for millions of users by providing data-driven insights. We worked with human annotators to identify errors in 800 segments of American English podcasts and classify them as major or minor. By developing the HEWER metric, we hope to enhance the overall podcast experience for both audiences and creators.
In conclusion, our study highlights the limitations of WER and the importance of considering readability when evaluating ASR transcripts. By introducing the HEWER metric, we aim to provide a more accurate and human-centric assessment of transcript quality. This research was made possible by the contributions of many individuals, and we are committed to continuing our efforts to enhance the accessibility and quality of Apple Podcasts for all users.
Source link