Inside the Largest Copyright Recovery in History

Episode transcript:

Note: This transcript is generated from a recorded conversation and may contain errors or omissions. It has been edited for clarity but may not fully capture the original intent or context. For accurate interpretation, please refer to the original audio.

JOHN QUINN: This is John Quinn and this is Law disrupted and today it’s my great pleasure to have on the show some lawyers who have gotten a $1.5 billion settlement in the very old cutting edge area of AI, AI training, uh, and copyrighted works. We’re gonna be speaking with Rachel Geman of Lieff Cabraser, Heimann and Bernstein Law Firm and Justin Nelson and Rohit Nath of the Susman Godfrey law firm.

Two law firms we know extremely well. Sometimes our firm, Quinn Emanuel, is on the same time, same side of the V, maybe more often we’re on opposite side to the V. But we have a long history together with these firms and a lot of mutual respect. But we’re here to talk about the incredible settlement that they achieved.

A $1.5 billion settlement for authors, about over 450,000 authors, regarding over 450,000 copyrighted works, with the AI company Anthropic, which used, I guess it’s agreed now, pirated, copyrighted, but pirated training materials to train their large language model. And as I understand it, this case does not involve the much-mooted issue about whether the use of copyrighted materials in itself to do the training of LLM and, you know, copyrighted materials that are legitimately, otherwise legitimately obtained or not pirated. This case does not concern the issue about whether use of the copyrighted materials in training sets is copyright infringement or fair use because it was agreed that these were, these came from pirated websites. Is that correct?

JUSTIN NELSON: Well John, it’s mostly correct. Our original complaint, this is Justin. Our original complaint did allege both the use of the material, but also really the focus of the original complaint was on the training of these large language models, the LLMs, that it was violating the copyright law to train them, that it was not fair use.

In our initial status conference with the court, I talked about the two theories that existed. One was the so-called Napster theory, which is the pirated theory, and the other was the training is not fair use. Judge also issued a summary judgment decision in June of this year where he found two things.

The first thing he found was that in fact the way that Anthropic acquired these books was in his words, irredeemably wrong, and that therefore, there was a copyright claim for the download and use of these pirated books, putting aside any training. He also found that these books did qualify for fair use under the AI training part of it.

To be clear, we disagreed with that part of the ruling. Obviously Anthropic disagreed with the piracy part of the ruling and we went on, we were scheduled for a trial on December 1st of this year in front of Judge Alsup’s courtroom. And we reached a settlement right at the close of fact discovery.

JOHN QUINN: So, as to the second part about that, that he ruled in June that this was fair use. The training was fair use because as I understand it, it was transformative.

JUSTIN NELSON: That was the main part of that decision. We obviously, you know, really disagree with that part of it. That’s an issue that is very much alive in a remainder of cases, both class action cases where counsel and other cases, against companies that are doing the training on books, but also, uh, across the board.

And we expect that that issue will continue to be fought out in courts over the next years.

JOHN QUINN: So this case, the claims that were settled in this case kind of raises the issue. What do these AI companies do with copyrighted works besides using them to train the large language models?

Here as I understand it, and you know, correct me if I’m wrong, they had these pirated materials and they made a copy of them and that is the crux of the claims that were settled. Is that correct, Rachel? Maybe if I could get you to address that.

RACHEL GEMAN: Yeah, no, absolutely. So a hundred percent reproducing copyright protected material from pirate sites is bad. It is wrong. It is irredeemably infringing. The question as to what, what in heck do these companies do with all these books and other materials? I mean, certainly, yes. They train on books. They want high quality. The AI companies want high quality content.

But the books are helpful, generally. They might help them test their models, study them, decide what they want to get. These companies really wanted books, and they wanted books really quickly. And so our view is that the, you know, the acquisition of the books per se is wrong. It’s not fair. And that the uses of books raises other problems as well.

JOHN QUINN: What, where, Rohit, where do these, pirated works come from? What are the sites where this kind of material is available?

ROHIT NATH: So there were two principle ones that Anthropic used and that sort of I would say that the industry at large has used. One is a site called Library Genesis, and that’s a website that comes out of Russia. It has been shut down a few times. There were a couple of cases against Library Genesis in the 2010s. But it’s still available. It’s hard to completely shut these things down. It’s a torrent site, and so that’s one of the places that Anthropic went to download, you know, millions of copies of books. The other place is a place called, it’s gone by a few different names, but, it was called Z Library. At the time that they downloaded it, it was literally called the Pirate Library Mirror. Mirror refers to a mirror site, meaning that it’s a mirror of what was once known as Z Library.

And that’s another spot that they downloaded hundreds of thousands of books. I guess there’s one more. A place that they downloaded books from, which is, or originated from a website or, a torrent called Bibliotic, and that’s called, and the dataset was called Books Three, and it was widely available as part of a publicly available training dataset called The Pile.

JOHN QUINN: Right. But this settlement settled the entire case, including the claims where the judge found that there was fair use. The whole case is over, is that correct?

ROHIT NATH: It’s correct that the whole case is over. That, but there were two sources of books for Anthropic. Basically they had a book scanning program. They went out and bought used and new books and scanned them. And then they also downloaded books from pirated websites. If there’s a book in the class that they downloaded from a pirated website, your copyright claims up until August 26th, 2025, the day we signed our term sheet.

Those are released if you participate in the settlement. But if you were one of the books that were purchased and scanned and you weren’t, you aren’t in the pirate of dataset, your claims are not leased, right?

JOHN QUINN: So, I mean, what does, what will this mean for the copyright owners? I mean, $1.5 billion is a big number, but this covered a lot of works and a lot of authors.

What will this mean? And, maybe you could tell us a little bit about the claims process because this cannot have been straightforward.

JUSTIN NELSON: Well, thanks. 1.5 billion in fact is the largest known copyright recovery of all time. Full stop. Class action, not a class action. Well, congratulations on that. Thank you.

It’s the largest recovery. It means that it’s about a little over $3,000 per work. We designed a claims process by which we are honoring contractual rights. Many of these works have agreements between the authors and the publishers. So for most of the works that are at issue, we set up a non-mandatory default of a 50 50 split between authors and publishers because that is reflective of most, but not necessarily all contracts, especially for academic presses and the so-called trade, the big publishing houses like Penguin, Random House and Hashed and Harper Collins. For educational works, those contracts really differ a bunch that we weren’t able to set a default, so they’ll have to say what they think the split should be. And then we’ll go through a claims process. There is a special master that’s set up if there are any disputes. We don’t expect there to be many disputes because as judge also pointed out, authors and publishers usually have found a way to get along. And that was part of, in fact a reason for his class certification.

And so far that has borne true. There was great cooperation amongst both the author owners and also the publisher owners, which was a huge strength of this case. So, but the $3,000 per work is real meaningful recovery, obviously just for one work. But many of the authors and publishers obviously have multiple works in suit.

And I think, John, most importantly, I think this also, I think by itself, a $3,000 number, the largest copyright recovery in history is great in its own right. It also sends a strong message that copyright owners deserve compensation. And I will, I think already we’ve seen that it sent shockwaves throughout the copyright and, and creative community.

RACHEL GEMAN: And if I could add to that, I agree with everything that was said. And, the other thing that this settlement gives to our, to our class, to our authors, to our publishers is that the pirated works are gonna be destroyed by Anthropic. Anthropic is certifying that it hasn’t used these works in training, its commercial models.

The former is I think more than just a cleanup, it’s hard to underestimate how upset and shocked and almost exposed our class members feel when they learn that their books are on these lists. I mean, everyone’s had the experience of being startled to learn that something of theirs is out there, whether it’s an email or something else.

And imagine that. Tenfold a hundredfold for the books that they wrote. So I think the destruction is really important. Justin talked about sending a signal and I echo that. And I think also this, I mean, on the one hand, the settle, the size of the settlement and the contours of the settlement send a signal. And I think the fact that if it weren’t for the settlement, we were really ready, willing, and eager to go to trial to let you know, to present these facts to a jury. And maybe that sends a signal too.

JOHN QUINN: You were interviewed by the American lawyer when this became public, and one of the things that you were asked is, well, $3,000 a work, is that enough?

When the statutory damages for willful infringement could have gone as high as $150,000 for work it is a much lower number and you kind of explained some of the challenges that you faced. Why $3,000 was a very good settlement, I mean, for the benefit of our audience, could one of you, Rohit, maybe you could talk about that?

ROHIT NATH: Yeah, absolutely. The $3000 a work, I think is a fantastic number. I think I’ll start with just some of the challenges we faced in the case and how we got it, where we were and what challenges we faced going forward. The first of all, this is, you know, this is a new area of a law.

It’s a risky case. There was another case that had somewhat similar facts against, against meta. Yeah. That was the case with before Judge Chhabria that came out around the same time. Correct. Yes. And there were allegations of piracy. Meta also went to Library Genesis, and the plaintiffs in that case lost.

And they, and ultimately, you know, there the case is still going there, there are some other issues, but, and Judge Chhabria said the plaintiff’s lawyers. I made the wrong argument. Yeah, that’s right. And so that was sort of, you know, that was an indication of sort of just in any of these cases where you’re dealing with new technology, how risky it can be, in a copyright case.

You know, the backdrop was the last major battle between rights holders and large technology companies over books was the Google Books case, which Google won, which Google ultimately won, and so, that was the backdrop we were fighting against. And so there was, you know, we think we’re right, absolutely.

But there was always a risk that we got poured out entirely.

JOHN QUINN: For those who don’t know, poured out, is Texas lawyers term for losing? I’ve come to learn that

ROHIT NATH: This is, I did grow up in Susman Godfrey, and so there was that risk. And then let’s say if it, you know, if we won, we were successful in summary judgment, I think we got, we developed a great record for willful infringement. And then the high end of damages, absolutely, the tippy top of it was at $150,000 of work.

But a jury can come out anywhere in that range. And trying a copyright case on statutory damages before a jury, you really have no idea where they’re gonna go. You can present your evidence, but the jury has wide discretion. It’s a little bit different than trying an actual damages case where you can kind of say, okay, here’s the, you know, here’s the top amount, or we can get a sense of what the number really should be or would be.

But the jury can pick anything in that range and the studies that we had looked at, there was a study that was somewhat recent, showed that in most copyright cases, the minor amount of copyright cases, jury select the minimum, which is $750 a work. And had Judge Alsup had at one point signaled that he thought that the minimum might be appropriate here because the argument was that they should have just bought a copy of the book. We disagreed with that. We thought there was wolf.

JUSTIN NELSON: We didn’t, he didn’t signal, but that was certainly a risk. I mean, he, you know, yeah, certainly didn’t say that’s how he was gonna come out. But even when I presented the preliminary approval to the court, he did mention that there was a chance that the relevant marker would be the cost of a used book, which would be, well under the $750 statutory minimum.

And of course. There would’ve been an open mind, but you just never know. Sorry, go ahead.

ROHIT NATH: Yeah, and then, you know, the other risk layered on top was that Anthropic had a 23f petition pending at the time we settled. And there was always a risk in a class action where you’re going to trial of a reversal, either an interlocutory appeal under 23f or after trial if you get a verdict of your class decision.

And so kind of against all those risks, we think $3,000 of work is just a slam dunk sell.

JOHN QUINN: Look, anybody who’s gonna criticize you for only settling for 1.5 billion, you know, I’ve got unsuccessful plaintiffs lawyers that they should talk to. But look, I mean, did Judge have any difficulty approving with the settlement? Were there some issues you had to get him past, some questions that he had?

RACHEL GEMAN: Yeah, I mean, he took a very hard and searching look at the settlement. He really wanted class counsel U.S. to make sure that we were doing the work of coming up with a plan of allocation. He asked us a whole bunch of questions.

He wanted to make sure all the traps were run. He didn’t want there to be any daylight between the parties in terms of their understanding of the key terms. Initially we had suggested essentially a working group of stakeholders to work a certain way to come up with a plan of allocation.

And the court kind of said, no, you do that. And we did. And we worked extremely hard to, you know, replicate preexisting contractual arrangements to honor them just as the judge told us to do to ensure a fair and efficient claims process just as rule 23 and the judge told us to do and to come up with something that we are extraordinarily proud of and that the, the judge was, you know, the judge approved and we are off to the races.

JUSTIN NELSON: Well, I’ll just add to from the trial lawyer’s perspective. It was great to have Judge Alsup and we all know that John, you know this, right? The most important thing is obviously you want the judge to side with you. You wanna present credible arguments, but you want a judge who understands the issues.

Obviously, Judge Alsup didn’t agree with us on a number of issues, including on training, but he was fair, he was thoughtful, and maybe just from a management perspective and for legal and non-legal listeners alike, perhaps the most important thing is keeping a trial schedule on track, keeping the case on track.

And so, in fact, this case was filed, about a year after some of the other AI cases, including the open AI and Microsoft. When was it filed? Filed in August of 2024, and almost exactly 365 days later, we had that term sheet for $1.5 billion. And the reason for that is because among many other things, but Judge also kept his schedule and we were put to our paces.

I described it to others as, you know, sprinting a marathon through a minefield with grenades being lobbed at you the entire time. But, you know, that’s, and that was

RACHEL GEMAN: A good day.

JUSTIN NELSON: Yeah, right. But as you know, that’s trial. Yeah. And so, but keeping that trial schedule and keeping everybody on pace and I mean there was always a risk of decertification of the class, which he mentioned still, which was always a risk for us as well. But it was very clarifying and it was clarifying to both sides.

RACHEL GEMAN: And I would add to that, just not just keeping a trial schedule that’s efficient and fast, but setting one, I mean, at the very, very first conference, the court, if he wanted a December trial, he wanted discovery done at the end of August.

He had a very clear view about how the case was gonna go, I mean, without prejudicing anything substantively, he had a vision for how the parties were gonna work hard, they were gonna be efficient, no one was gonna waste time. He essentially ordered that a deposition occur maybe a few days after the first conference.

He wanted the parties to take the case. And of course, we wanted to take the case extraordinarily seriously. It was a top-of-mind case. And this was true just from the beginning.

JOHN QUINN: How did the settlement come together? Did you use a mediator? Or how did it come about?

JUSTIN NELSON: We used, it’s public. We used Judge Layn Phillips, our former district court Judge Layn Phillips, who mediated. It did not settle at the mediation. It settled about a week after that, after numerous discussions and backs and forths after that. But it settled. We reached a term sheet. The mediation, I believe, was on August 19th.

The term sheet was late night on August 25th. And then we had our settlement, original settlement papers due. We announced it to the court immediately. And our settlement papers were due on September 5th when we reached a final long form settlement agreement.

JOHN QUINN: I mean, to go back to the issue about whether it’s infringement to use copyrighted materials to change large language models. You know, it seems like every time there’s a new technology that involves use of content, you have copyright issues surface. Like how does that, how does this new line fit in the old bottles of copyright law. And usually, I mean, Napster’s an exception, you know, for, I think there are reasons why Napster was an exception.

Arguably, I mean, the fact that these were pirated works makes your case somewhat different. My impression is that there’s kind of been a historically kind of a thumb on the scales in favor of the new technology and finding fair use. I mean, do you have any reaction to that? I know you’re on the other side of that, but I think historically that’s kind of a fair summary.

RACHEL GEMAN: I think it depends what you mean by historically, right? Because if you look at you know, certainly if you look at some of the sort of early internet, post internet cases, it goes without saying that, as you suggest, there were a lot of cases finding that certain conduct by tech companies was fair use.

Now we, you know, we think those cases were distinguishable for various reasons. But more holistically, they were that the conduct at issue in these cases does actually have historical precedence that in fact shows that copyright holders need to be protected. There are certainly cases the other side can cite, but we have a good background too, of cases that weighed the balance, weighed the equities, and showed that behavior somewhat similar to this, really requires the protection of the copyright holder.

So I guess I’m saying both things, like yes, certainly there’s a lot of cases that defendants cite. But we’ve got cases too. And I would just encourage listeners to think those cases that, you know, that the defendants really rely on, they’re not sort of, you know, God speaking to Moses, they were based on a particular set of facts around the, you know, that happened. There were a lot of cases around the turn of the century and later this century that are, you know, certainly worth looking at, but it’s not the entire history of the protection of copyright.

JOHN QUINN: I mean, to be clear, in the cases that we have, I mean, we’re on the side of the AI companies.

JUSTIN NELSON: Yes. You’re adverse to us on one of them actually. Oh no.

JOHN QUINN: I don’t even know which one you’re referring to, but we’re always adverse to you guys on some. We have a case, you probably know against Anthropic for Reddit. Yeah. But it’s not a copyright case.

It’s a contract case. It’s the terms of service case. Yep. I mean, Reddit has been down this road with more than one AI company. Some that aren’t public, that never saw the light of day because Reddit data is such high quality data in the AI world. And some companies have just kind of wholesale hoovered that up.

JUSTIN NELSON: Well that’s really, I just wanna add to what Rachel said about, again, let’s put aside what one case said in 1996 versus another case in 1980 versus another in 2005. Okay. Let’s just go back to common sense. Okay. What the AI companies in these cases are doing, as you say John, is, they’re hoovering up the world.

They are strip mining the entire humanities creations, the combine works of humanities. That is their raw input. They don’t wanna pay for it. Okay. That is their as we said in the tech tutorial in front of judge also, that is their oil, right. They are literally going in and in a very real manner, taking everybody’s creativity, putting it together in these fancy chips that are, you know, worth, have powered lots of people to lots of money and lots of companies to lots of money to engage in these fancy algorithms to predict what the next word will be. That’s all these are. They are fancy predictors and how do they fancy predict?

They fancy predict, based upon your and my expression and the combined works of these copyrighted works across the entire century, really, let alone these books, which are so important. I will say one difference. And you mentioned the Reddit case. Okay. The one really strong hook that copyright law provides that really was a strong hook for us in Anthropic, was this idea of statutory damages.

We mentioned it before, but if you’re going just off of a terms of use or a contract case, or if you’re talking about other ways to recover, potentially you don’t have that hook of a statutory damage. And that really has been for those who are registering copyrights and who registered before the piracy, before the training, whatever the act was, it really provides a strong hook to allow for where we are today.

JOHN QUINN: Well, it’s really a exciting time in our profession and for our practices. Have any of you read the book If Anyone builds it, Everyone dies? No. Do you know the book?

RACHEL GEMAN: I’ve heard of it.

JOHN QUINN: It’s worth reading. It’s not hysterical. It’s sobering. I mean, it’s kind of off subject, but in a way not. But it’s super interesting. The problem that what I learned from it was that programs, functionalities, are not crafted, put together. They’re grown. You plant a seed and you think you can in some sense train it as to what it’s quote unquote preferences are. But my takeaway of the authors, one of the key points is there’s an unpredictable element to that. These programs tend to go off in different directions.

They’re not predictable. That’s the scary part.

JUSTIN NELSON: I mean, has anybody seen any 1980s science fiction? I mean, have we looked about some of this stuff on the Terminator and whatever else we’re talking about and why? Again, this goes back to this idea, like, let’s go back to common sense. Okay. Shouldn’t creators have a right and say about whether they are gonna participate in this technology that even their creators acknowledge?

I mean, Dario Amodei of Anthropic is publicly saying that he thinks that there’s somewhere between a 15 and 25% chance that this will basically destroy humanity.

JOHN QUINN: Yeah, like the guy who won the Nobel Prize who was at a Google DeepMind. What? From Toronto? What’s his name?

RACHEL GEMAN: Oh, yeah. Jeff….

I’m gonna butcher his name and embarrass myself.

JOHN QUINN: Henson? This guy, last name starts with…

RACHEL GEMAN: Jeff Henson.

JOHN QUINN: Yeah. He says, he’s quoted, saying there’s only a 20% chance that it’ll destroy all of humanity. Yeah.

JUSTIN NELSON: I mean, could we as humans have a right to have a say in this? Yeah.

JOHN QUINN: Right. Well, at that point, I think the copyright, you know, rights are gonna be, well, no, hold on. Like this says, the law gives us tools, right?

JUSTIN NELSON: Yes. You’re right in the sense that, well, maybe it’s, but this says. Right. I mean, we, we have agency here, right? And this is really why I think so many of us feel so strongly that the training itself really is a copyright violation and that we really should be thinking about this not in terms of, again, necessarily case, yes, it fits within the case law, but hat this is in many ways existential and that copyright owners really deserve a chance and a say into how it’s used.

JOHN QUINN: I can see how you got your 1.5 billion.

RACHEL GEMAN: But it’s almost like one of these Pascal’s Wager thing, because on the one hand you have some very prominent AI scientists, right, who will say these LLMs have already peaked. They’re, they’re just not smart. They don’t have any, they don’t have any way to feel the world.

Um, you know, just next word, predictors, et cetera. And then you have the, the, the 15 to 20% crowd. And I think it’s hard for a lot of regular folk out there to be like, what? You know, how do I think about this technology? Is it gonna kill me or am I lucky if it just takes my job? Right? Either way, under either scenario, both, you gotta protect the content creators who are so integral to this creation if it just turns out to be just another technology.

Right. All the more reason that the creators who are so instrumental in it should be created and if indeed it is this risk that just underscores the permissioning and the consent that these folks really want, I, we cannot stress enough when we talk to our clients, to class members, to rights holders.

The lack of consent is really in their craw, especially ’cause this is eminently doable. This is, you know, in a world that can create these very, very complicated models, that is consistent with a world where you can figure out how to get permission from the copyright holders. This is not our hardest problem.

JOHN QUINN: Well, thank you. This has been a fascinating conversation for me. We’ve been speaking with Rachel Geman of the Lieff Cabraser, Heimann and Bernstein firm and Justin Nelson and Rohit Nath of the Susman Godfrey firm about their historic $1.5 billion settlement with Anthropic for a class of authors in the copyrights action.

This is John Quinn and it’s been Law disrupted.

Thank you for listening to Law disrupted with me, John Quinn. If you enjoyed the show, please subscribe and leave a rating and review on your chosen podcast app. To stay up to date with the latest episodes. You can sign up for email alerts at our website, Law disrupted fm, or follow me on X at JB Q Law or at Quinn Emanuel.

Thank you for tuning in.

Published: Oct 28 2025

Inside the Largest Copyright Recovery in History

With Rachel Geman, Justin A. Nelson, and Rohit Nath

With: Rachel Geman, Justin A. Nelson, and Rohit Nath

Host: John B. Quinn

Share this episode:

Episode transcript:

More episodes:

Corporate Law Changes in Delaware

Inside the Largest Copyright Recovery in History

Inside the World’s Most Famous Startup Accelerator

Subscribe now for

Email updates