OpenAI has unveiled its latest model, Voice Engine, designed to produce natural-sounding speech from text input and a brief 15-second audio sample. Notably, Voice Engine can generate emotive and realistic voices using this short audio input.
OpenAI revealed that the Voice Engine project began in late 2022, initially powering preset voices within OpenAI’s text-to-speech API, ChatGPT Voice, and Read Aloud features. However, due to concerns about potential misuse, the company has not yet made it available to the public, similar to its text-to-video generation model, Sora.
“We acknowledge the serious risks of generating speech that mimics people’s voices, especially in an election year. We are collaborating with U.S. and international partners from various sectors to ensure we consider their feedback during development,” the company stated in a blog post.
OpenAI has been conducting private trials of Voice Engine with a select group of trusted partners, showing promising applications across different industries. These include:
1. Enhancing Education: Age of Learning uses Voice Engine to create pre-scripted voice-over content for reading assistance among non-readers and children, expanding content accessibility through personalized interactions with GPT-4.
2. Global Content Translation: HeyGen utilizes Voice Engine for translating videos and podcasts into multiple languages while preserving the original speaker’s accent, facilitating global content distribution and audience engagement.
3. Community Health Services: Dimagi leverages Voice Engine and GPT-4 to improve service delivery in remote areas, particularly in healthcare settings, by providing interactive feedback in local languages for effective counseling and support services.
4. Assistive Communication: Livox integrates Voice Engine into its communication app to offer non-robotic and customizable voices for individuals with speech-related disabilities, enabling authentic expression across different languages and communication contexts.
5. Clinical Applications: The Norman Prince Neurosciences Institute at Lifespan explores Voice Engine’s potential in clinical settings for restoring speech in patients with speech impairments caused by medical conditions, making it a valuable tool for speech rehabilitation and patient care.