OpenAI caused a stir this week when it told a UK parliamentary committee that it would be “impossible” to develop today’s leading AI systems without using vast amounts of copyrighted data.
The company argued that advanced AI tools like ChatGPT require such broad training that adhering to copyright law would be impractical.
In written testimony, OpenAI stated that due to expansive copyright laws and the prevalence of protected online content, “virtually every sort of human expression” would be off-limits for training data. From news articles to forum comments to digital images, there is little online content that can be used freely and legally.
According to OpenAI, attempts to create capable AI while avoiding copyright infringement would fail: “Limiting training data to public domain books and drawings created more than a century ago … would not provide AI systems that meet the needs of today’s citizens.”
While defending its practices as compliant, OpenAI acknowledged that partnerships and compensation schemes with publishers may be necessary to “support and empower creators.” However, the company did not indicate any intention to significantly restrict its collection of online data, including paywalled journalism and literature.
This stance has exposed OpenAI to multiple lawsuits, including from media outlets like The New York Times alleging copyright violations.
Nevertheless, OpenAI seems unwilling to fundamentally change its data collection and training processes, given the “impossible” constraints that self-imposed copyright limits would impose. Instead, the company hopes to rely on broad interpretations of fair use allowances to legally leverage vast amounts of copyrighted data.
As advanced AI continues to demonstrate remarkable abilities in emulating human expression, legal experts anticipate intense courtroom battles over infringement by systems that are inherently designed to absorb large volumes of protected text, media, and other creative output.
For now, OpenAI is betting against copyright maximalists in favor of near-unlimited copying to drive ongoing AI development.
(Photo by Levart_Photographer on Unsplash)
See also: OpenAI’s GPT Store to launch next week after delays
Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with Digital Transformation Week and Cyber Security & Cloud Expo.
Explore other upcoming enterprise technology events and webinars powered by TechForge here.