It’s easy to tamper with watermarks from AI-generated text

AI language models function by predicting the next probable word in a sentence, generating one word at a time based on these predictions. Text watermarking algorithms categorize the vocabulary of the language model into words on a “green list” and a “red list,” and then instruct the AI model to choose words from the green list. The presence of more green list words in a sentence indicates a higher likelihood that the text was computer-generated, as humans tend to use a more diverse mix of words.

The researchers manipulated five different watermarks that operate in this manner. According to Staab, they managed to reverse-engineer the watermarks by repeatedly accessing the AI model with the watermark applied using an API. The responses obtained enable the attacker to replicate the watermark by constructing an approximate model of the watermarking rules. This is achieved by analyzing the AI outputs and comparing them with standard text.

Once they have a rough idea of the watermarked words, the researchers can carry out two types of attacks. The first, known as a spoofing attack, enables malicious actors to produce text based on the stolen watermark information. The second attack allows hackers to erase the watermark from AI-generated text, making it appear as if it was written by a human.

The team achieved an approximately 80% success rate in spoofing watermarks and an 85% success rate in removing watermarks from AI-generated text.

Researchers not associated with the ETH Zürich team, like Soheil Feizi, an associate professor and director of the Reliable AI Lab at the University of Maryland, have also identified watermarks as unreliable and susceptible to spoofing attacks.

The findings from ETH Zürich validate the persistence of these issues with watermarks, extending to the most advanced chatbots and large language models in use today, according to Feizi. The research emphasizes the need for caution when implementing detection mechanisms on a large scale.

Although the findings highlight the shortcomings of watermarks, they still represent the most promising method for identifying AI-generated content, as stated by Nikola Jovanović, a PhD student at ETH Zürich who participated in the research. However, further research is necessary to ensure that watermarks are ready for widespread deployment. In the meantime, it is important to manage expectations regarding the reliability and usefulness of these tools. “If it’s better than nothing, it is still useful,” Jovanović asserts.

Update: This research will be presented at the International Conference on Learning Representations conference. The story has been updated to reflect this.

Source link

It’s easy to tamper with watermarks from AI-generated text

Robotics Engineering Career Fair to connect candidates, employers at Robotics Summit

Provide live agent assistance for your chatbot users with Amazon Lex and Talkdesk cloud contact center

Related Posts

How insurance companies can use synthetic data to fight bias

From Low-Level to High-Level Tasks: Scaling Fine-Tuning with the ANDROIDCONTROL Dataset

Decoding Decoder-Only Transformers: Insights from Google DeepMind’s Paper

How Game Theory Can Make AI More Reliable

Buffer of Thoughts (BoT): A Novel Thought-Augmented Reasoning AI Approach for Enhancing Accuracy, Efficiency, and Robustness of LLMs

Deciphering Doubt: Navigating Uncertainty in LLM Responses

Provide live agent assistance for your chatbot users with Amazon Lex and Talkdesk cloud contact center

Leader Spotlight: Growing the omnichannel market, with Christine Kuei

Ethereum's Blobs: A Milestone in Scaling and Future Development, According to Vitalik Buterin

Leave a Reply Cancel reply

Amazon’s Bedrock and Titan Generative AI Services Enter General Availability

Fireworks AI Open Sources FireLLaVA: A Commercially-Usable Version of the LLaVA Model Leveraging Only OSS Models for Data Generation and Training

9 Best Open Source Text-to-Speech (TTS) Engines

Unlocking the Power of Big Data: The Fascinating World of Graph Learning | by Mathieu Laversin | Nov, 2023

Digital Marketing Basics for Beginners | Fundamentals of Digital Marketing 2023 | Simplilearn

Creating Fluid Typography with the CSS clamp() Function — SitePoint

Can You Guess What Percentage Of Their Wealth The Rich Keep In Cash?

AI Compared: Which Assistant Is the Best?

How insurance companies can use synthetic data to fight bias

5 SLA metrics you should be monitoring

From Low-Level to High-Level Tasks: Scaling Fine-Tuning with the ANDROIDCONTROL Dataset

UGRO Capital: Targeting to hit milestone of Rs 20,000 cr loan book in 8-10 quarters: Shachindra Nath

CATEGORIES

SITEMAP

Welcome Back!

Create New Account!

Retrieve your password