AI companies have implemented safeguards to prevent their models from producing harmful or dangerous information. Instead of creating their own AI models without these protections, which is costly, time-consuming, and challenging, cybercriminals have turned to a new trend: jailbreak-as-a-service.
Most models have usage rules in place. Jailbreaking enables users to manipulate the AI system to generate outputs that violate these policies, like creating ransomware code or generating text for scam emails.
EscapeGPT and BlackhatGPT are services that provide anonymous access to language-model APIs and jailbreaking prompts that are regularly updated. To combat this emerging underground industry, AI companies like OpenAI and Google must constantly address security vulnerabilities that could be exploited by malicious actors.
Jailbreaking services employ various tactics to bypass safety measures, such as posing hypothetical queries or using foreign languages. There is an ongoing battle between AI companies striving to prevent model misuse and malicious actors devising increasingly inventive jailbreaking techniques.
Ciancaglini notes that these services are highly appealing to criminals. “Keeping up with jailbreaks is a tedious task. You create a new one, test it, it works for a while, then OpenAI updates their model,” he explains. “Jailbreaking is a fascinating service for criminals.”
Doxxing and surveillance
AI language models are ideal tools not only for phishing but also for doxxing (revealing private, identifying information about someone online), according to Balunović. These models are trained on vast amounts of internet data, including personal information, enabling them to deduce details like an individual’s location.
For instance, a chatbot could be instructed to pose as a private investigator and analyze text written by the victim to extract personal information from subtle clues, such as determining their age based on high school attendance or identifying their location based on mentioned landmarks. The more information available about an individual online, the easier it is to pinpoint their identity.