Authors demand credit and compensation from AI companies using their work without permission | OpenAI, Alphabet, and Meta have been called out

L4sBot · 1 year ago

Authors demand credit and compensation from AI companies using their work without permission | OpenAI, Alphabet, and Meta have been called out

@fubo · edit-2 1 year ago

To date, it remains legal for humans to borrow a book from the library, read it, learn skills & knowledge from it, and apply what they’ve learned to make money — without ever paying the author or publisher.

Copyright does not, in general, grant control over the ideas in a work; only their specific expression. It deals with copying the text; it is not a tax on the information or knowledge contained in that text.

It also does not assure the author or publisher of a share of all revenues that anyone is ever able to make using the knowledge recorded in a work.

@[email protected] · 1 year ago

I mostly agree, but we have to recognize that AI (currently) is not an intelligence. The data it bases its behavior on could be argued as copying and allow the copyright owner to dictate how it’s used. I think it’s probably going to be considered more like a derivative work, which generally falls within fair use.

It’s a tough issue and it’s going to be interesting to see how it plays out.

@[email protected] · 1 year ago

I suspect the problem is that AI copies whole sentences that were originally published by authors - not that it just “learns” from it.

@fubo · 1 year ago

A while back I tried to get ChatGPT to recite the text of Harry Potter and the Philosopher’s Stone to me, as a test of just how much copyrighted text it’s willing to recite.

It got partway through the first sentence before freezing up, presumably due to a sensitivity to copyright.

So I suspect that at least OpenAI are taking significant steps already to prevent their systems from reciting copyrighted text verbatim.

@average650 · 1 year ago

I just tried it with bing chat and it actually explains it can’t because it would violate the authors copyright.

@[email protected] · 1 year ago

But it doesn’t copy full sentences. If they did, maybe they wouldn’t be such black boxes. They build this utterly insanely huge matrix of data that are basically just weights for parameters and there’s billions of parameters (which make up the entirety of what the LLM “knows” or “can know”). It’s closest to text prediction. Even though it doesn’t know full sentences, if the sentence was used enough times, it can predict the rest of it. It can even do that without having scraped a book, simply because it scraped something else (likely many something elses) that had the quote.

@average650 · 1 year ago

The lines between a specific expression and the idea behind them are very blurred with AI…

@kromem · 1 year ago

Yes, but it isn’t legal to download a repository of pirated books to read and learn from.

Did OpenAI check out the books they trained their model on from the library one at a time?

I’m generally very much against the copyright creep that’s being advocated by some trying to get training to be infringement.

But at the same time, OpenAI should have at least needed to buy retail copies of books they were using to train the AI on, or getting access through legal means, and one of the chief allegations against them was that they effectively built the AI on the open seas of piracy by using a data set that contained copyrighted content illegally distributed.

So the overreach by copyright holders to claim rights to training is BS, but there may well still be a valid claim against OpenAI in how they went about it.

@fubo · 1 year ago

Even if I illegally download a book about carpentry, and learn how to build a doghouse from it, and go into the business of building doghouses, the extent of my liability to the copyright holder does not include my entire doghouse-building revenue.

Even if I subsequently teach other people to build doghouses, if I’m not further copying the actual contents of the book, I am not further liable for copyright infringement.

Copyright is actually pretty narrow, and should not be construed to give authors or publishing companies unbridled control over the ideas or knowledge contained in works.

@kromem · 1 year ago

Yes, but you are liable for damages in having pirated it.

Where did I say anything about being liable for future revenue?

But it’s a special level of dumb to build a billion dollar company on material that you pirated and can be confirmed to have possessed and used by your end product.

Suits trend to have multiple claims trying to get the plaintiffs as much compensation as possible. Even if all the crap about training as infringement gets thrown out (as it should), claims OpenAI committed one of the largest copyright infringements in recent history by obtaining and using pirated material in violation of copyright law is likely going to have hefty damages attached if it can be proved (which it will be if it happened).

If you downloaded music from Napster and got caught in the early 2000s, did the MPAA fine you got only the retail price of the song?

If you illegally downloaded a book about carpentry, and get caught, do you think you don’t have to pay anything for having illegally downloaded it?

@fubo · 1 year ago

Yes, but you are liable for damages in having pirated it.

Sure, if someone can show that you did.

Based on my own experimentation, ChatGPT knows facts about the Harry Potter novels, but it does not recite the text of them when asked to do so. Does it contain a pirated copy of them? I can’t tell. Maybe it just reads a lot of open-source fanfic off AO3.

@kromem · 1 year ago

I’m starting to realize several people in this thread don’t understand how subpoenas work.

Saik0 · 1 year ago

Yes, but it isn’t legal to download a repository of pirated books to read and learn from.

Sure, but that still doesn’t change any of the above statements. If I steal a book from a library, read it… You get the point. All you can get me for is for… What exactly? Cost of the book + maybe a civil penalty? This is going to be a nothing burger for these writers if they’re hoping for a payday. Further how do we know what specific repository that the AI got it’s content from? It could be that the content it got was from some forum of a person summarizing a chapter + a review for the book + <insert tons more content from another source>. There’s no evidence I’ve seen thusfar that any of these AI systems are accessing books illegally to begin with. Or that those books were the only source that it derives its responses from.

The AI isn’t reproducing the book and thus isn’t violating copyright as literally everything it will produce is derivative which is protected. Unless you can get the AI to recite a book back verbatim… Which I’ve not been successful in doing personally… and I’ve seen no evidence of anyone else doing either.

@kromem · edit-2 1 year ago

Cost of the book + maybe a civil penalty?

Does no one remember the days of Napster and the multiples over retail cost that people caught pirating were charged?

And technically piracy is a federal crime, so there could even be criminal charges.

A “nothing burger”?

Let’s see…oh my, what’s this? 504.c.2

In a case where the copyright owner sustains the burden of proving, and the court finds, that infringement was committed willfully, the court in its discretion may increase the award of statutory damages to a sum of not more than $150,000.

That’s per work infringed.

Nothing burger indeed.

OpenAI is on the other end of over two decades of fearmongering and lobbying to enact laws with ridiculous penalties for piracy in the digital age.

As for how we know where they got the information, that’s what subpoenas are for in a legal proceeding. Even if training information is not publicly disclosed, whether they did or didn’t pirate content is going to come out privately in court.

The AI doesn’t need to reproduce the book for OpenAI to have infringed in illegally sourcing the copyrightable material they used in training.

Saik0 · 1 year ago

You failed to read my post. You jumped straight into an assumption that piracy can be proved rather than actually reading what I’ve posted.

If you’re going to continue with strawman arguments then please return to reddit.

@kromem · 1 year ago

Piracy can be proved if it occurred by talking to employees under oath and subpoenaing relevant email records.

The idea the court would need to reverse engineer ChatGPT to find out is absurd.