Whether large language models are “memorizing,” “regurgitating” or “learning” information is at the center of many debates around generative artificial intelligence’s legal standing, including in the most recent New York Times’ complaint against OpenAI and Microsoft alleging copyright infringement.

While the New York Times v. Microsoft, OpenAI suit is just getting started, a recent study unrelated to the lawsuit found that about 60% of OpenAI’s GPT-3.5 model outputs contained “some form of plagiarized content,” with 45.7% of all outputs containing “identical text.”