Microsoft and OpenAI Accused of Copyright Theft

13 Aug 2024

The Center for Investigative Reporting Inc. v. OpenAI Court Filing, retrieved on June 27, 2024, is part of HackerNoon’s Legal PDF Series. You can jump to any part in this filing here. This part is 10 of 18.

DEFENDANTS’ ACTUAL AND CONSTRUCTIVE KNOWLEDGE OF THEIR VIOLATIONS

105. The OpenAI Defendants have acknowledged that use of copyright-protected works to train ChatGPT requires a license to that content. and, in some instances. Recognizing that obligation, the OpenAI Defendants have entered into agreements with large copyright owners such as Associated Press, the Atlantic, Axel Springer, Dotdash Meredith, Financial Times, News Corp, and Vox Media to obtain licenses to include those entities’ copyright-protected works in Defendants’ LLM training data.

106. The OpenAI Defendants are also in licensing talks with other copyright owners in the news industry, but have offered no compensation to Plaintiff.

107. In a May 29, 2024 interview, OpenAI’s Chief of Intellectual Property and Content, Tom Rubin, stated that these deals focus on “the display of news content and use of the tools and tech,” and are thus “largely not” about training.[15] This admission, while qualified, confirms that these deals involve training, at least in part.

108. The OpenAI Defendants created tools in late 2023 to allow copyright owners to block their work from being incorporated into training sets. This further corroborates that the OpenAI Defendants had reason to know that use of copyrighted material in their training sets is copyright infringement.

109. The creation of such tools also corroborates that the OpenAI Defendants had reason to know that their copyright infringement is enabled, facilitated, and concealed by their removal of author, title, copyright, and terms of use information from their training sets.

110. Defendants had reason to know that the removal of author, title, copyright notice, and terms of use information from copyright-protected works and their use in training ChatGPT would result in ChatGPT providing responses to ChatGPT users that abridged or regurgitated material verbatim from copyrighted works in creating responses to users, without revealing that those works were subject to Plaintiff’s copyrights. This is at least because Defendants were aware that ChatGPT responses are the product of its training sets and that ChatGPT generally would not know any author, title, copyright notice, and terms of use information that was not included in training sets.

111. Upon information and belief, Defendants had reason to know that the removal of author, copyright notice, and terms of use information from copyright-protected works used in synthetic searching would result in ChatGPT or Copilot providing responses to ChatGPT users that abridged or regurgitated material verbatim from copyrighted works in creating responses to users, without revealing that those works were subject to Plaintiff’s copyrights. This is at least because Defendants were aware that Copilot’s and later versions of ChatGPT’s responses to prompts are the product of the articles encoded in their computer memory, from which, upon information and belief, Defendants removed author, copyright notice, and terms of use information.

112. Defendants had reason to know that users of ChatGPT would further distribute the results of ChatGPT responses. This is at least because Defendants promote ChatGPT as a tool that can be used by a user to generate content for a further audience.

113. Defendants had reason to know that users of ChatGPT would be less likely to distribute ChatGPT responses if they were made aware of the author, title, copyright notice, and terms of use information applicable to the material used to generate those responses. This is at least because Defendants were aware that at least some likely users of ChatGPT respect the copyrights of others or fear liability for copyright infringement.

114. Defendants had reason to know that ChatGPT would be less popular and would generate less revenue if users believed that ChatGPT responses violated third-party copyrights or if users were otherwise concerned about further distributing ChatGPT responses. This is at least because Defendants were aware that Defendants derive revenue from user subscriptions, that at least some likely users of ChatGPT respect the copyrights of others or fear liability for copyright infringement, and that such users would not pay to use a product that might result in copyright liability or did not respect the copyrights of others.

115. If a commercial user of Defendants’ ChatGPT and Copilot products is sued for copyright infringement, Defendants have committed to paying the user’s costs in defending against the infringement claim, and to indemnifying the user for an adverse judgment or settlement. These commitments apply only if the user uses the product as advertised. In particular, Microsoft’s “Copilot Copyright Commitment” applies only if the user “used the guardrails and content filters we have built into our products,”[16] and OpenAI’s “Copyright Shield” does not apply if the user “disabled, ignored, or did not use any relevant citation, filtering or safety features or restrictions provided by OpenAI.”[17] Thus, Defendants know or have reason to know that ChatGPT and Copilot users are capable of infringing and likely to infringe copyright even when used according to terms specified by Defendants.

Continue Reading Here.

About HackerNoon Legal PDF Series: We bring you the most important technical and insightful public domain court case filings.

This court case retrieved on June 27, 2024, motherjones.com is part of the public domain. The court-created documents are works of the federal government, and under copyright law, are automatically placed in the public domain and may be shared without legal restriction.

[15] Charlotte Tobitt, OpenAI content boss: ‘Incumbent’ on us to help small publishers, not just the giants, PressGazette (May 30, 2024), https://pressgazette.co.uk/platforms/openai-tom-rubinpublishers-news/.

[16] https://www.microsoft.com/en-us/licensing/news/microsoft-copilot-copyright-commitment.

[17] https://openai.com/policies/service-terms/.

← Previous

OpenAI's Alleged Role in Copyright Infringement and Removal of Copyright Management Information

Up Next →

Google’s Contracts Harm Competition In The General Search Services Market