Recent advances in generative artificial intelligence have sparked developments in realistic speech synthesis. This technology has the potential to improve lives through personalized voice assistants and accessibility-enhancing communication tools. However, it has also given rise to deepfakes, where synthesized speech can be used to deceive humans and machines for malicious purposes.
In response to this evolving threat, Ning Zhang, an assistant professor of computer science and engineering at the McKelvey School of Engineering at Washington University in St. Louis, has created a tool called AntiFake. This innovative defense mechanism is designed to prevent unauthorized speech synthesis. Zhang presented AntiFake on November 27 at the Association for Computing Machinery’s Conference on Computer and Communications Security in Copenhagen, Denmark.
Unlike traditional methods of deepfake detection, which are used as post-attack mitigation tools to identify and expose synthetic audio, AntiFake takes a proactive approach. It uses adversarial techniques to make it more challenging for AI tools to extract crucial characteristics from voice recordings, thus thwarting the synthesis of deceptive speech. The code for AntiFake is freely available to users.
“AntiFake ensures that when we share voice data, it becomes difficult for criminals to use that information to synthesize our voices and impersonate us,” explained Zhang. “The tool leverages adversarial AI techniques that were originally part of the cybercriminals’ toolbox, but we are now utilizing them for defense. We subtly distort or perturb the recorded audio signal, making it still sound natural to human listeners but completely different to AI.”
To ensure the effectiveness of AntiFake against a constantly changing landscape of potential attackers and unknown synthesis models, Zhang and Zhiyuan Yu, a graduate student in Zhang’s lab and the first author of the study, developed the tool to be adaptable. They tested AntiFake against five state-of-the-art speech synthesizers and achieved a protection rate of over 95%, even against unseen commercial synthesizers. The tool’s usability was also confirmed through tests with 24 human participants, demonstrating its accessibility to diverse populations.
Currently, AntiFake can safeguard short speech clips, targeting the most common form of voice impersonation. However, Zhang stated that there are no limitations preventing the expansion of the tool’s capabilities to protect longer recordings or even music. This expansion would contribute to the ongoing fight against disinformation.
“Ultimately, our goal is to fully protect voice recordings,” Zhang stated. “Although I cannot predict the future of AI voice technology, as new tools and features are constantly being developed, I believe that our strategy of turning adversaries’ techniques against them will remain effective. AI remains vulnerable to adversarial perturbations, and we may need to adjust the engineering specifics to ensure the continued success of this strategy.”