When AI Remembers Too Much

New Research on Language Models and Copyrighted Content

The intersection of artificial intelligence and intellectual property law continues to generate pressing questions for academics, policymakers, and industry leaders. Researchers have managed to reproduce 96% of a Harry Potter book by simply giving the LLM the first sentence and employing some clever prompting.

A recent paper by Nasr et al., titled “Extracting books from production language models,” offers empirical evidence that deserves our attention – particularly as educators and researchers who both use these tools and train the next generation of technology leaders.

What the Research Shows

The study demonstrates that production-grade large language models – the same systems increasingly integrated into educational and professional workflows – can reproduce substantial portions of copyrighted books, sometimes near-verbatim. This is not a theoretical vulnerability. The researchers successfully extracted content from widely-used commercial models, including those developed by OpenAI and Anthropic*.

What makes these findings particularly noteworthy is that extraction proved possible even in models equipped with safety measures designed to prevent such outputs. In some cases, researchers needed to employ “jailbreaking” techniques to bypass safeguards, but in others, models complied directly with extraction prompts*.

Implications for Academia and Policy

For those of us in higher education, this research raises several considerations:

Curriculum development
- As we integrate AI tools into teaching and research, we must help students understand both the capabilities and the legal complexities of these systems.
Institutional policy
- Universities will need clear guidelines on the responsible use of generative AI, particularly regarding intellectual property.
Research ethics
- The study demonstrates the importance of interdisciplinary dialogue between computer scientists, legal scholars, and ethicists.

Looking Forward

The authors rightly emphasize that addressing these challenges requires continued collaboration between AI developers, legal experts, and policymakers*.

This is not a problem with a simple solution – but it is precisely the kind of problem our academic community is positioned to help solve.

*: Read the paper: Extracting books from production language models

Robert Polding's Home