The US Copyright Office is seeking public input on copyright law and policy issues raised by generative AI and is assessing whether federal legislative or US government regulation are warranted.
Generative AI (genAI) technology has been documented relying on copyrighted works used to train up underlying language models. The Copyright Office will be looking into the appropriate levels of transparency and disclosure involving the use of copyrighted works, the legal status of AI-generated outputs, and the appropriate treatment of AI-generated outputs that mimic personal attributes of human artists.
In March, the Copyright Office launched an AI initiative to look into issues around copyright infringement by genAI and other AI technology. So far this year, the agency has held four public sessions and two webinars on the use of copyrighted material for AI.
The agency has already gathered feedback and questions, and is now looking for more public input “from the broadest audience to date in the initiative.” It plans to use the information to advise Congress; inform the agency’s own regulatory work; and offer information and resources to the public, courts, and other government entities considering the issue of copyright infringement.
“We launched this initiative at the beginning of the year to focus on the increasingly complex issues raised by generative AI,” said Shira Perlmutter, register of copyrights and director of the US Copyright Office. “We look forward to continuing to examine these issues of vital importance to the evolution of technology and the future of human creativity.”
Nitish Mittal, a partner in the technology practice of research firm Everest Group, said the issue of AI copyright infringement has been at the heart of the media and entertainment industry. In recent months, the Writers Guild of America, Sarah Silverman, Christopher Golden, and Richard Kadrey have all come out against ChatGPT creator OpenAI and Meta over claims of copyright infringement, Mettal said. The Writers Guild is pushing to ban the use of AI-generated content.
The primary issue is it’s not clear who owns the content generated by AI models.
“The technology providers…are aiming to act as platforms for these AI models, but not taking a stance on who has legal rights and legitimate ownership of the content these systems churn out,” said Mittal, who leads Everest’s digital transformation and IT services group in Europe.
The notice of inquiry is an important, but not surprising, move, he said, noting there are four primary risks in AI (and genAI) that need special attention:
- Data security and privacy
- Explainability
- Ownership and responsibility
- Bias and ethics
The Copyright Office wants feedback from content producers (such as writers and studios), legal entities (regulators, lawyers, and courts), and technology providers (big tech companies and foundational model providers). That feedback is needed to hammer out a common regulatory framework and implement the framework consistently, Mittel said.
“I don’t see this as a negative for these technology companies, but a necessary step to further accelerate the adoption of responsible AI,” he said. “Currently, many large organizations are rethinking their AI efforts due to the regulatory and legal ambiguity around content and outcome ownership. Any framework/law to address that will help AI adoption in the long run.”
Earlier this year, the Biden administration unveiled an effort to address the risks around GenAI, which has been advancing at breakneck speeds since ChatGPT burst on the scene late last year, setting off alarm bells among industry experts.
Vice President Kamala Harris and other administration officials met with the CEOs of Google, Microsoft, OpenAI, and AI-startup Anthropic. The rules that came from that meeting, however, were meant to offer “guidance” and are not legally binding.
Avivah Litan, a distinguished vice president analyst for Gartner, said the copyright issues around GenAI model training is “the pinnacle of a clash between old world regulations and new world innovations.
“I’m not sure how it will be resolved,” she said.
GenAI can respond to user queries based on large language models (LLMs), algorithms tied back to billions — even trillions — of parameters used to generate content that includes text, voice, images and video. LLMs, however, must be trained on data and information drawn from a myriad of sources, including the Internet and, in many cases, companies that upload private information in order to tailor outputs for themselves and customers.
There are emerging and evolving standards for training AI models and authenticating content, such as the Coalition for Content Provenance and Authenticity (C2PA), a non-profit industry group that has created technical standards for certifying the source and history (or provenance) of media content — including that created by genAI.
“So, it’s entirely possible to create a standard to identify copyright materials, when generated, that can then be sourced back when LLM and other GenAI models produce content,” Litan said. That kind of authenticiation could “indicate that the copyright marked materials were used in training to generate a given GenAI Model response.”
Code generation products, such as CodeWhisperer and Github Copilot, also have reference trackers that include licensing information for code recommendations and links back to the code repository to understand the license terms, according to Litan.
“There’s no reason the industry can’t implement the same concept with copyright materials,” she said. “Regulators should arrive at relevant standards that can be implemented on a going forward basis, which would be least disruptive for the hosting LLM vendors, or on a retroactive basis.”
Done retroactively, the task would be more arduous; LLM creators would first have to mark their copyrighted content and then the vendors would have to use the marked content to retrain their models.
“Either way, implementing these solutions will take a lot of time and money — especially the second….. It behooves the regulators to come up with policies now and set time frames by which they can be implemented,” Litan said. “The other option is ignoring copyright materials all together and changing the laws so that copyright protections are not honored with generative AI applications. The Japanese have gone down this path.”