Congress told AI firms should pay for copyrighted content

@[email protected] · 10 months ago

Congress told AI firms should pay for copyrighted content

@grue · 10 months ago

Nevermind paying for content; AI firms should be required to abide by the terms of copyleft-licensed training data. In other words, all output of an AI trained on even a dataset containing even a single copyleft work should be required to be copyleft itself.

@[email protected] · 10 months ago

This

@[email protected] · 10 months ago

Yeah that won’t work. Any country that lets their companies use whatever data will flat out have better models. Potentially meaning grater economic output of the whole country if ai is as big as i think. Unfortunate but i don’t see an alternative yet unless they make it so you can use amy data but models have to be free

@grue · 10 months ago

The alternative is to take the good training data and then accept the output being copyleft without whining about it.

@a2800276 · 10 months ago

Doesn’t that argument apply to any instance of ignoring intellectual property. Books, records and movies will also be cheaper in countries that let companies do what they want. Medicine would be more accessible, ignoring patients will greatly accelerate innovation in countries where permitted…

@[email protected] · 10 months ago

You don’t have to pay the rightsholder if your hired human reads various newspapers in order to learn how to write. Or at least no more than a single person’s subscription fee to said content.

So why the hell should you have to pay more to train an AI model on the same content?

It’s faster than a human? So what? Why does that entitle you to more money? There are fast and slow humans already, and we don’t charge them differently for access to copyright material.

The tool that’s being created is used by more than one human/organization? So what? Freelance journalists write for many publications after having learned on your material. You aren’t charging them a license fee for every org they write for.

That being said, this is one of those turning points in the world where it doesn’t matter what the results of these lawsuits are, this technology is going to use copyrighted material whether it’s licensed or not. Companies will just need to adapt to the new reality.

OpenAI and other large companies are the target right now, but the much smaller open source generative AI models are catching up fast, and there’s no way to stop individuals using copyright material to train or personalize their AI, currently it’s processing intensive to train, but it’s already dropped in price by orders of magnitude, and it’s going to keep getting cheaper as computing hardware gets better.

If all you see is the article written by Joe Guy, and it’s a good article with useful information, you can’t prove that Joe even used a tool most of the time, let alone that the tool was trained on a specific piece of copyrighted material, especially if everyone’s training for their AI is a little bit different. Unless it straight up plagiarizes, no court is going to convict Joe. Avoiding direct plagiarism is as easy as having a plagiarism tool double check against the original training material.

@[email protected] · 10 months ago

Why not just use free use content? There’s plenty of it. More than ever before in human history

@[email protected] · 10 months ago

I’m not actually sure you’re accurate with your statement. Prior to copyright law being introduced, everything was free use.

These days, anything a human produces immediately becomes copyrighted. Every post you make, every podcast you record, every doodle on a napkin, every instagram post, every speech you deliver…

You actually have to intentionally license it for free use, which almost nobody does.

@echo64 · 10 months ago

There’s large amounts of freely licensed content.

@[email protected] · 10 months ago

There is enough freely licensed content to make whatever you want. I have no trouble at all making websites and comic books and video games using freely licensed content.

@[email protected] · 10 months ago

You were trained on Copyright materials. You’ve read copyrighted websites, comic books, and video games.

@[email protected] · 10 months ago

Yeah I paid for em too

@[email protected] · 10 months ago

You paid your own money for every single copyright work you’ve ever seen in your life?

No you did not. Not even close.

You didn’t pay for it at school because a lot of it falls under educational fair dealing rules. You didn’t pay when you borrowed that video game from your friend, or when you read a graphic novel at the library.

And you definitely didn’t pay for every news article you’ve read online.

@[email protected] · 10 months ago

You paid your own money for every single copyright work you’ve ever seen in your life?

I never claimed this distinction, and I don’t think it’s a meaningful point.

I’m saying that I pay for art. These companies don’t, but more to the point, they seek to undermine their source once they’ve extracted all the training data they need. I’d go so far as to say it’s in poor taste to use free art, because it should be patently obvious that most artists putting out free art, did not anticipate its use by devices that let you bypass artists entirely.

There’s an alternate way that this could have all gone down: after some internal testing, we could have simply asked artists to volunteer their work for the project of training. There are enough people excited about the tech that this would have been plenty! It just wouldn’t have let companies rush for market share, and hope the business utility would gloss over any ethical qualms in the aftermath.

@Jomega · 10 months ago

deleted by creator

@[email protected] · edit-2 10 months ago

With the right tools and resources all content on the internet becomes freely available.

@echo64 · 10 months ago

If Ais were capable of invention and creation, I might agree. But they aren’t. They regurgitate what they are modeled on.

We don’t teach AIs, they don’t learn, there’s no university, there’s no fundamentals. We just have models that reproject. They take the training data, mix it all up, and then project it out again.

There is use to that, but gpt isn’t a child. It can not learn, comprehend, or understand. It’s a tool, and as a tool, it depends heavily on the work created by others.

@[email protected] · edit-2 10 months ago

You overestimate humans with your argument.

You couldn’t even make your comment right now if a teacher hadn’t taught you english, you couldn’t have typed it if engineers hadn’t created computers and keyboards, you couldn’t have posted it without network technicians who setup and run the internet…

The modern world is literally the combined work of billions of humans.

If you were left alone as a baby, you’d be dead.

Sneezycat · edit-2 10 months ago

Okay so according to your logic, it is impossible for us to have this conversation. No human could’ve invented those things, therefore they can’t exist.

Or are you saying humans can learn, but our capacity for that is greatly amplified by the knowledge humanity gave us?

If it’s the latter, yeah, we’re standing on the shoulders of giants. But AI is fundamentally different, that’s the point of the comment above.

AI could never in however many million years get to the point humanity has gotten to, because we humans learn, and AIs don’t. They would stagnate without humans even if they could train from each other.

@[email protected] · 10 months ago

How is it fundamentally different?

AIs can learn, it’s totally possible to setup one that will remember what you’ve said in future answers. The fact that they don’t have their own agency is irrelevant. They still learn the same way we do, by looking at a ton of examples, and then trying and receiving feedback from trainers.

Lets say you record yourself walking in a mall, and the mall has copyrighted music playing in the background. Do you need to pay copyright to make that recording? No. You would only need to pay copyright if you then played the video in a commercial context.

So the recording, processing, and storing of the copyright material does not require a license. Only the playback. Which in the case of AI doesn’t happen, they don’t produce exact copies of copyright material because the exact copyrighted data isn’t actually stored inside them. They may be similar in nature, but revealing the plot of a book you read to others is not considered a copyright infringement.

@echo64 · 10 months ago

No, humans learn, ais regurgitate. There is a distinct difference.

@Jozzo · 10 months ago

You cant really compare like that, learning is an input and regurgitating is an output.

Humans learn and regurgitate much the same as an AI learns and regurgitates.

A human can only output things based on input it’s received in the past. Try imagining a new color. Any color you could possibly come up with is just some combination of colors that already exist. By painting with purple are you not “regurgitating” the work of red and blue?

AutoTL;DR · 10 months ago

This is the best summary I could come up with:

The New York Times recently sued OpenAI, accusing the startup of unlawfully scraping “millions of [its] copyrighted news articles, in-depth investigations, opinion pieces, reviews, how-to guides and more.”

Danielle Coffey, CEO of the News/Media Alliance trade association, noted that chatbots designed to crawl the web and act like a search engine, like Microsoft Bing or Perplexity, can summarize articles too.

Readers could ask them to extract and condense information from news reports, meaning there would be less incentive for people to visit publishers’ sites, leading to a loss of traffic and ad revenue.

Jeff Jarvis, who recently retired from the City University of New York’s Newmark Graduate School of Journalism, is against licensing for all uses and was afraid it could set precedents that would affect journalists and small, open source companies competing with Big Tech.

Revealing their sources might make their AI tools look bad too, considering the amount of inappropriate text their models have ingested, including people’s personal information and toxic or NSFW content.

“The notion that the tech industry is saying that it’s too complicated to license from such an array of content owners doesn’t stand up,” said Curtis LeGeyt, president and CEO of the National Association of Broadcasters.

The original article contains 877 words, the summary contains 202 words. Saved 77%. I’m a bot and I’m open source!

Pika · 10 months ago

meanwhile japan stated that trademark and copyright doesn’t exist in the AI world. Man the two sides are spread wide

@[email protected] · 10 months ago

I emailed my senators to tell them not to support this bill, maybe it’ll help