Generative AI presents challenges to current copyright law that were previously unforeseen and uncomfortable. The US Copyright Office has provided guidance stating that the output generated by AI is not copyrightable unless human creativity has been involved in the prompts that produced the output. This ruling raises questions about the extent of creativity required and whether it aligns with the kind of creativity exercised by artists. Additionally, if a human creates software to generate prompts that then generate an image, is that copyrightable? If the output of a model cannot be owned by a human, who or what is responsible if that output infringes existing copyright? Can an artist’s style be copyrighted and what would that entail? Another issue arises with cases involving text, where the use of copyrighted texts as part of the training data for a language model is considered copyright infringement, even if the model never reproduces those texts in its output. However, reading texts has always been a part of the human learning process, and the acquisition of texts legally or illegally is not addressed by copyright law.
In this context, the concept of data dignity introduced by Jaron Lanier in The New Yorker becomes relevant. Lanier suggests a distinction between training a model and generating output using a model. Training involves teaching the model to understand and reproduce human language, while generating output involves providing instructions that lead to the model producing something. Lanier argues that training a model should be a protected activity, while the output generated by a model can potentially infringe on someone’s copyright. This distinction is appealing because current copyright law protects “transformative use,” and AI models are inherently transformative. However, the challenge lies in differentiating between training and generating output, as the current state of AI models often makes it impossible to connect the output to the training inputs.
AI models are based on probabilities and generate output by computing the most likely next word based on the given prompt. Therefore, it is difficult to argue that the model is copying the text, as it is merely following statistical probabilities. It lacks the creativity recognized by humans and is more like a “stochastic parrot” than a human plagiarizing a literary text. Nevertheless, the question of compensating authors for the use of their work when AI models produce their work as output remains relevant.
For companies like O’Reilly Media, distinguishing between creative output and actionable output becomes important. Creative output, such as writing in the style of a specific author, might result in a new novel that differs significantly from the original author’s work. This does not devalue the original work but can potentially increase its value, similar to how fan fiction can contribute to the popularity of the original. On the other hand, actionable output, like software that replaces previously published code, may impact the revenue of the original programmer. However, even these cases have similarities, as actors or screenwriters could have their work ingested by a model and transformed into new roles or scripts. The complexities of copyright law in the age of artificial intelligence require careful consideration to strike a balance between protecting creative works and fostering the development of AI technology.
Source link