The Irony of 'You Wouldn't Download a Car' Making a Comeback in AI Debates

@FatCat · 4 months ago

The Irony of 'You Wouldn't Download a Car' Making a Comeback in AI Debates

@TommySoda · edit-2 4 months ago

Here’s an experiment for you to try at home. Ask an AI model a question, copy a sentence or two of what they give back, and paste it into a search engine. The results may surprise you.

And stop comparing AI to humans but then giving AI models more freedom. If I wrote a paper I’d need to cite my sources. Where the fuck are your sources ChatGPT? Oh right, we’re not allowed to see that but you can take whatever you want from us. Sounds fair.

BarqsHasBite · 4 months ago

Can you just give us the TLDE?

@[email protected] · 4 months ago

AI Chat bots copy/paste much of their “training data” verbatim.

@[email protected] · 4 months ago

Not to fully argue against your point, but I do want to push back on the citations bit. Given the way an LLM is trained, it’s not really close to equivalent to me citing papers researched for a paper. That would be more akin to asking me to cite every piece of written or verbal media I’ve ever encountered as they all contributed in some small way to way that the words were formulated here.

Now, if specific data were injected into the prompt, or maybe if it was fine-tuned on a small subset of highly specific data, I would agree those should be cited as they are being accessed more verbatim. The whole “magic” of LLMs was that it needed to cross a threshold of data, combined with the attentional mechanism, and then the network was pretty suddenly able to maintain coherent sentences structure. It was only with loads of varied data from many different sources that this really emerged.

fmstrat · 4 months ago

This is the catch with OPs entire statement about transformation. Their premise is flawed, because the next most likely token is usually the same word the author of a work chose.

@TommySoda · 4 months ago

And that’s kinda my point. I understand that transformation is totally fine but these LLM literally copy and paste shit. And that’s still if you are comparing AI to people which I think is completely ridiculous. If anything these things are just more complicated search engines with half the usefulness. If I search online about how to change a tire I can find some reliable sources to do so. If I ask AI how to change a tire it would just spit something out that might not even be accurate and I’d have to search again afterwards just to make sure what it told me was even accurate.

It’s just a word calculator based on information stolen from people without their consent. It has no original thought process so it has no way to transform anything. All it can do is copy and paste in different combinations.

azuth · 4 months ago

It’s not a breach of copyright or other IP law not to cite sources on your paper.

Getting your paper rejected for lacking sources is also not infringing in your freedom. Being forced to pay damages and delete your paper from any public space would be infringement of your freedom.

@[email protected] · 4 months ago

I’m pretty sure that it’s true that citing sources isn’t really relevant to copyright violation, either you are violating or not. Saying where you copied from doesn’t change anything, but if you are using some ideas with your own analysis and words it isn’t a violation either way.

@Eatspancakes84 · 4 months ago

With music this often ends up in civil court. Pretty sure the same can in theory happen for written texts, but the commercial value of most written texts is not worth the cost of litigation.

@TommySoda · 4 months ago

I mean, you’re not necessarily wrong. But that doesn’t change the fact that it’s still stealing, which was my point. Just because laws haven’t caught up to it yet doesn’t make it any less of a shitty thing to do.

@[email protected] · 4 months ago

When I analyze a melody I play on a piano, I see that it reflects the music I heard that day or sometimes, even music I heard and liked years ago.

Having parts similar or a part that is (coincidentally) identical to a part from another song is not stealing and does not infringe upon any law.

@takeda · 4 months ago

You guys are missing a fundamental point. The copyright was created to protect an author for specific amount of time so somebody else doesn’t profit from their work essentially stealing their deserved revenue.

LLM AI was created to do exactly that.

azuth · 4 months ago

It’s not stealing, its not even ‘piracy’ which also is not stealing.

Copyright laws need to be scaled back, to not criminalize socially accepted behavior, not expand.

@[email protected] · edit-2 4 months ago

The original source material is still there. They just made a copy of it. If you think that’s stealing then online piracy is stealing as well.

@TommySoda · 4 months ago

Well they make a profit off of it, so yes. I have nothing against piracy, but if you’re reselling it that’s a different story.

@[email protected] · 4 months ago

But piracy saves you money which is effectively the same as making a profit. Also, it’s not just that they’re selling other people’s work for profit. You’re also paying for the insane amount of computing power it takes to train and run the AI plus salaries of the workers etc.