Authors demand credit and compensation from AI companies using their work without permission | OpenAI, Alphabet, and Meta have been called out

L4sBot · 1 year ago

Authors demand credit and compensation from AI companies using their work without permission | OpenAI, Alphabet, and Meta have been called out

@kromem · 1 year ago

Yes, but it isn’t legal to download a repository of pirated books to read and learn from.

Did OpenAI check out the books they trained their model on from the library one at a time?

I’m generally very much against the copyright creep that’s being advocated by some trying to get training to be infringement.

But at the same time, OpenAI should have at least needed to buy retail copies of books they were using to train the AI on, or getting access through legal means, and one of the chief allegations against them was that they effectively built the AI on the open seas of piracy by using a data set that contained copyrighted content illegally distributed.

So the overreach by copyright holders to claim rights to training is BS, but there may well still be a valid claim against OpenAI in how they went about it.

@fubo · 1 year ago

Even if I illegally download a book about carpentry, and learn how to build a doghouse from it, and go into the business of building doghouses, the extent of my liability to the copyright holder does not include my entire doghouse-building revenue.

Even if I subsequently teach other people to build doghouses, if I’m not further copying the actual contents of the book, I am not further liable for copyright infringement.

Copyright is actually pretty narrow, and should not be construed to give authors or publishing companies unbridled control over the ideas or knowledge contained in works.

@kromem · 1 year ago

Yes, but you are liable for damages in having pirated it.

Where did I say anything about being liable for future revenue?

But it’s a special level of dumb to build a billion dollar company on material that you pirated and can be confirmed to have possessed and used by your end product.

Suits trend to have multiple claims trying to get the plaintiffs as much compensation as possible. Even if all the crap about training as infringement gets thrown out (as it should), claims OpenAI committed one of the largest copyright infringements in recent history by obtaining and using pirated material in violation of copyright law is likely going to have hefty damages attached if it can be proved (which it will be if it happened).

If you downloaded music from Napster and got caught in the early 2000s, did the MPAA fine you got only the retail price of the song?

If you illegally downloaded a book about carpentry, and get caught, do you think you don’t have to pay anything for having illegally downloaded it?

@fubo · 1 year ago

Yes, but you are liable for damages in having pirated it.

Sure, if someone can show that you did.

Based on my own experimentation, ChatGPT knows facts about the Harry Potter novels, but it does not recite the text of them when asked to do so. Does it contain a pirated copy of them? I can’t tell. Maybe it just reads a lot of open-source fanfic off AO3.

@kromem · 1 year ago

I’m starting to realize several people in this thread don’t understand how subpoenas work.

Saik0 · 1 year ago

Yes, but it isn’t legal to download a repository of pirated books to read and learn from.

Sure, but that still doesn’t change any of the above statements. If I steal a book from a library, read it… You get the point. All you can get me for is for… What exactly? Cost of the book + maybe a civil penalty? This is going to be a nothing burger for these writers if they’re hoping for a payday. Further how do we know what specific repository that the AI got it’s content from? It could be that the content it got was from some forum of a person summarizing a chapter + a review for the book + <insert tons more content from another source>. There’s no evidence I’ve seen thusfar that any of these AI systems are accessing books illegally to begin with. Or that those books were the only source that it derives its responses from.

The AI isn’t reproducing the book and thus isn’t violating copyright as literally everything it will produce is derivative which is protected. Unless you can get the AI to recite a book back verbatim… Which I’ve not been successful in doing personally… and I’ve seen no evidence of anyone else doing either.

@kromem · edit-2 1 year ago

Cost of the book + maybe a civil penalty?

Does no one remember the days of Napster and the multiples over retail cost that people caught pirating were charged?

And technically piracy is a federal crime, so there could even be criminal charges.

A “nothing burger”?

Let’s see…oh my, what’s this? 504.c.2

In a case where the copyright owner sustains the burden of proving, and the court finds, that infringement was committed willfully, the court in its discretion may increase the award of statutory damages to a sum of not more than $150,000.

That’s per work infringed.

Nothing burger indeed.

OpenAI is on the other end of over two decades of fearmongering and lobbying to enact laws with ridiculous penalties for piracy in the digital age.

As for how we know where they got the information, that’s what subpoenas are for in a legal proceeding. Even if training information is not publicly disclosed, whether they did or didn’t pirate content is going to come out privately in court.

The AI doesn’t need to reproduce the book for OpenAI to have infringed in illegally sourcing the copyrightable material they used in training.

Saik0 · 1 year ago

You failed to read my post. You jumped straight into an assumption that piracy can be proved rather than actually reading what I’ve posted.

If you’re going to continue with strawman arguments then please return to reddit.

@kromem · 1 year ago

Piracy can be proved if it occurred by talking to employees under oath and subpoenaing relevant email records.

The idea the court would need to reverse engineer ChatGPT to find out is absurd.