Wikipedia is one of the last genuine places on the Internet, and these rat bastards are trying to contaminate that, too

destructdisc · 2 days ago

Wikipedia is one of the last genuine places on the Internet, and these rat bastards are trying to contaminate that, too

melsaskca@lemmy.ca · 8 hours ago

Finally! Now all of the “Scientology” histories will be safe! /s

ZDL@lazysoci.al · 12 hours ago

What is wrong in the techbrodude head that makes them only think of ruining things? Like it seems to me that they literally spend their days looking at things that are good and saying “what can I do to fuck this up for a profit?”

Should being a techie go into the DSM-V as a subheading under narcissistic personality disorder?

CileTheSane@lemmy.ca · 15 hours ago

“just tell your LLM not to do that”

You ever ask an LLM to modify a picture and “don’t change anything else”? It’s going to change other things.

Case in point: https://youtu.be/XnWOVQ7Gtzw

MML@sh.itjust.works · 15 hours ago

That’s why you always add “and no mistakes”

madjo@feddit.nl · 13 hours ago

Also “don’t hallucinate”

bless@lemmy.ml · 12 hours ago

And “don’t become self arrest”

Kuinox · 13 hours ago

You are mixing two kind of AI, LLM and diffusion.
It’s way harder for a diffusion model to not change the rest, the first step of a diffusion model is to use a lossy compression to transform the picture into a soup of digits that the diffusion model can understand.

CileTheSane@lemmy.ca · 5 hours ago

And an LLM will convert a prompt into a bunch of tokens the model can understand.

Kaz@lemmy.org · 1 day ago

These fuckin AI “enthusiasts” are just making the rest of the world hate AI more.

Losers who cant achieve anything without AI are just going to keep doing this shit.

thedeadwalking4242 · 23 hours ago

Fr if they just let it go instead of forcing it on everyone people might even be enthusiastic.

JackBinimbul@lemmy.blahaj.zone · 1 day ago

I am so goddamned tired of AI being shoved into every collective orifice of our society.

markstos · 1 day ago

Congrats on inventing what high school students figured out a year ago to skirt AI homework detectors.

cheesybuddha · 1 day ago

So they are using AI to make it so AI can’t detect that they are using AI?

What kind of technological ouroborous of nonsense is this?

DacoTaco · 10 hours ago

It gets better. Using llm’s to check if the output of an llm is hallucinated or not! They call it a judge and its funny as hell tbh

Bakkoda · 21 hours ago

Magic

Saapas@piefed.zip · 14 hours ago

Well yeah AIs learn stuff. This is a neverending fight

SchwertImStein@lemmy.dbzer0.com · 14 hours ago

llms do not learn, they are a spicy keyboard autocomplete

Saapas@piefed.zip · 13 hours ago

Even my keyboard autocomplete learns from my typing

I Cast Fist@programming.dev · 8 hours ago

Unlike llms, who will get all confused whenever you ask for a non-existent emoji, over and over and over again, instead of realizing that “there is no seahorse emoji”

Saapas@piefed.zip · 8 hours ago

It does have weird hangups and stuff it just stubbornly refuses to listen others on. Very lifelike lmao

SchwertImStein@lemmy.dbzer0.com · 13 hours ago

you and I have very different definitions of learning

Saapas@piefed.zip · 13 hours ago

I guess so

minorkeys · 1 day ago

It’s an arms race, AI identification vs AI adaptation. I wonder which side the companies that own these LLMs want to win…

elfin8er · 1 day ago

They don’t want anyone to win. The arms race makes money.

Avid Amoeba@lemmy.ca · 1 day ago

From the repo:

Have opinions. Don’t just report facts - react to them. “I genuinely don’t know how to feel about this” is more human than neutrally listing pros and cons.

durindana@lemmy.zip · 17 hours ago

lol brilliant

JcbAzPx · 1 day ago

That will at least be easy to spot in a Wikipedia entry.

Jayjader@jlai.lu · 1 day ago

I really despise how Claude’s creators and users are turning the definition of “skill” from “the ability to use [learned] knowledge to enhance execution” into “a blurb of text that [usefully] constrains a next-token-predictor”.

I guess, if you squint, it’s akin to how biologists will talk about species “evolving to fit a niche” amongst themselves or how physicists will talk about nature “abhorring a vacuum”. At least they aren’t talking about a fucking product that benefits from hype to get sold.

prole@lemmy.blahaj.zone · 1 day ago

I can’t help but get secondhand embarrassment whenever I see someone unironically call themselves a “prompt engineer”. 🤮

moonshadow@slrpnk.net · 23 hours ago

Sloperator

captainlezbian · 1 day ago

Hey, they had to learn thermodynamics and spend 3 semesters in calculus to write those prompts

m4xie@lemmy.ca · 1 day ago

I’m a terrible procrastinator engineer.

OctopusNemeses · 1 day ago

Isn’t this a thing that authoritarians do. They co-opt language. It’s the same thing conservatives do. The venn diagram of tech bros and the far right is too close to being a circle.

You can pretty put any word out of the dictionary into a search engine and the first results are some tech company that took the word either as their company name or redefined it into some buzzword.

chuckleslord · 1 day ago

Skills were functions/frameworks built for Alexa, so they just appropriated the term from there.

gmtom · 1 day ago

Wikipedia has already partnered with AIbcompanies to help train their LLMs.

thespcicifcocean · 14 hours ago

Honestly, I think that training on wikipedia is probably better than training from reddit. At least on wikipedia the llm might get some factual information

Phoenix3875 · 2 days ago

You do understand this is more akin to white hat testing, right?

Those who want to exploit this will do it anyway, except they won’t publish the result. By making the exploit public, the risk will be known if not mitigated.

unepelle@mander.xyz · 1 day ago

I’m admittedly not knowledgeable in White Hat Hacking, but are you supposed to publicize the vulnerability, release a shortcut to exploit it telling people to ‘enjoy’, or even call the vulnerability handy ?

teft@piefed.social · 1 day ago

Responsible disclosure is what a white hat does. You report the bug to whomever is the party responsible for patching and give them time to fix it.

PlexSheep@infosec.pub · 1 day ago

That sort of depends on the situation. Responsible disclosure is for if there is some relevant security hole that is an actual risk to businesses and people, while this here is just “haha look LLMs can now better pretend to write good text if you tell it to”. That’s not really responsible disclosurable. It’s not even specific to one singular product.

FooBarrington · 1 day ago

Considering the “vulnerability” here is on the level of “don’t use password as your password” - yeah, releasing it all is exactly the right step.

Lumidaub@feddit.org · 2 days ago

Seeing as OpenAI struggled to make its AI avoid the em dash and still hasn’t entirely managed to do it, I’m not too worried.

jimmy90 · 11 hours ago

i’m fine with LLM contributions to wikipedia as long as they have references and are human validated

it’s actually something that LLMs can potentially do quite well

FiniteBanjo@feddit.online · 2 days ago

TBF OpenAI are a bunch of idiots running the world’s largest ponzi scheme. If DeepMind tried it and failed then…

Well I still wouldn’t be surprised, but at least it would be worth citing.

chickenf622@sh.itjust.works · 2 days ago

I think the inherit issue is the current “AI” is inherently non-deterministic, so it’s impossible to fix these issues totally. You can feed am AI all the data on how to not sound AI, but you need massive amounts of non-AI writing to reinforce that. With AI being so prevalent nowadays you can’t guarantee a dataset nowadays is AI free, so you get the old “garbage in garbage out” problem that AI companies cannot solve. I still think generative AI has it’s place as a tool, I use it for quick and dirty text manipulation, but it’s being applied to every problem we have like it’s a magic silver bullet. I’m ranting at this point and I’m going to stop here.

vala@lemmy.dbzer0.com · 1 day ago

FWIW, LLMs are deterministic. Usually the commercial front-ends don’t let you set the seed but behind the scenes the only reason the output changes each time it’s that the seed changes. If you set a fixed seed, input X always leads to output Y.

ThirdConsul@lemmy.zip · 6 hours ago

From the user perspective: no? I think they called that “temperature” and even setting that to 0 didn’t make the result the same the next day after cache cleared.

FiniteBanjo@feddit.online · 2 days ago

I honestly disagree that it has any use. Being a statistical model with high variance makes it a liability, no matter which task you use it for will produce worse results than a human being and will create new problems that didn’t exist before.

BarrelAgedBoredom@lemmy.zip · 21 hours ago

I use it to put together study guides so that, instead of spending a bunch of time typing and formatting, I’m spending it studying. It’s fed directly from my notes and slides and it rarely gets anything wrong (I read through it twice and cross reference with my notes). If anything, I’m usually removing stuff for being unnecessary or rewording things here and there to be.bstter suited to me. What took several hours now takes 30-45 minutes

Don’t take this as a defense of AI, it definitely isn’t. If AI disappeared tomorrow the world would be better off. Formatting study guides are literally the only utility I’ve found in LLMs

FiniteBanjo@feddit.online · 21 hours ago

Any amount of using AI for learning is learning from Slop which will make you less informed.

BarrelAgedBoredom@lemmy.zip · 20 hours ago

Tell that to my 3.9 GPA lol. I’m not learning from slop, I’m using a program to format my notes and slides from my lectures, and then verifying that information before committing it to memory

FiniteBanjo@feddit.online · 19 hours ago

Man have I got a solution for you, check out this formatted method to accomplish the same task:

I’m ~~using a program to format my notes and slides from my lectures, and then~~ verifying that information before committing it to memory

Cethin@lemmy.zip · 2 days ago

If you’re running it locally you can set how much variance it has. However, I mostly agree, in that it creates a bunch of trash. This doesn’t mean it has no use though. It’s like the monkeys on a typewriter thought experiment, but the monkey’s output is fairly constrained so it takes much fewer attempts to create what you want. It depends on the complexity of the solution required whether it’ll come up with a good solution in a reasonable amount of tries. If it’s a novel solution, it probably never will, because it’s constrained to solutions it’s seen before.

chickenf622@sh.itjust.works · 2 days ago

The high variance is why I only use it for dead simple tasks, e.g. “create and array of US states abbreviations in JavaScript”, otherwise I’m in full agreement with you. If you can’t verify the output is correct the it’s useless.

GojuRyu · 1 day ago

Wouldn’t that be slower to do, simply because checking it got all states, didn’t repeat any and didn’t make up any would be slower than copying a list from the web and quickly turning that into an array by hand with multiline cursors?

msage@programming.dev · 2 days ago

Why would you have this use for multi-billion dollar earth scorching torment nexus?

eleijeep@piefed.social · 1 day ago

That’s like one web search and then one shell command. You can probably just copy paste a column of a table from wikipedia and then run a simple search/replace in your text editor. Why are you feeding the orphan crushing machine for this?

bridgeenjoyer@sh.itjust.works · 1 day ago

Because its .01% easier to do this.

Also many people laugh at you if you try to say how ai is destroying the environment for no reason. Doesn’t affect them, you go live in a cave you luddite!

hector@lemmy.today · 1 day ago

Ai is useful for sorting datasets amd pulling relevent info in some cases, ie propublica has used it for articles.

Obviously simple sorting for them, case law is too complicated for such sifting of data, it was trained on reddit after all.

FiniteBanjo@feddit.online · edit-2 1 day ago

And when, not if but when, it makes a mistake by pulling hallucinated info or data then it’s going to be your fault, that’s why it’s a liability.

hector@lemmy.today · 1 day ago

The simple stuff it can do, trying to remember how propublica used it, but it was just like sifting through a database and pulling out all mentions of a word.

When you get into giving case law, it’s way too complicated for it and it hallucinates.

ThirdConsul@lemmy.zip · 6 hours ago

You’re describing RAG, the others are describing LLMs.

eleijeep@piefed.social · 22 hours ago

sifting through a database and pulling out all mentions of a word.

You mean keyword search that has existed since the beginning of time?

frank@sopuli.xyz · 1 day ago

I think the best use is “making filler” so like in a game, having some deep background shit that no one looks at, or making a fake advertisement in a cyberpunk type game. Something to fill the world out that reduces the work of real artists if they choose to

FiniteBanjo@feddit.online · 1 day ago

If you can’t be bothered to write filler then it’s an insult for you to expect others to read it. You’re just wasting people’s time.

frank@sopuli.xyz · 1 day ago

I guess the point is for people to not read the filler.

I think of the text that’s too small to read on a computer in the background. It’s nice that it’s slightly more real looking than a copy/paste screen.

Not even close to worth destroying the environment over, but it’s a neat use case to me

Catoblepas@piefed.blahaj.zone · 1 day ago

I think of the text that’s too small to read on a computer in the background.

Lorem ipsum has been used in typesetting since the 60s. If it’s not meant to be read, it doesn’t matter if it’s lorem ipsum text.

Not trying to dogpile you, I just think even things that seem ‘useful’ for LLMs almost always have preexisting solutions that are decades old.

homura1650 · 2 days ago

Datasets are not the only mechanism to train AI. You can also use reinforcement learning. This requires you to have a good fitness function. In some domains, that is not a problem. For LLMs, however, we do not have such a function. However, we can use a hybrid approach, where we train a model based on a data set and optimizing for fitness functions that address part of what we want (e.g. avoiding em dashes). In practice, this tends to be tricky, as ML tends to be a bit too good at optimizing for fitness functions, and will often do it in ways you don’t want. This is why if you want to develop a real AI product, you actually need AI engineers who know what they are doing; not prompt engineers who will try and find the magic incantation that makes someone else’s AI do what they want

hector@lemmy.today · 1 day ago

We should crowdsource a program to sniff out ai data crawlers, then poison the data they harvest without them knowing, for companies to employ.

0_o7@lemmy.dbzer0.com · 1 day ago

You have to understand that their public facing product is not the same as the one they allow enterprise or state actors to use.

They benefit from public thinking they have these stupid limitations, gives them more space to curate their product offerings where the real money is made.

Lumidaub@feddit.org · 1 day ago

I don’t understand how the public thinking these are bad products is an incentive for especially state actors to use them. That seems counterintuitive.

udon · 2 days ago

If these “signs of AI writing” are merely linguistic, good for them. This is as accurate as a lie detector (i.e., not accurate) and nobody should use this for any real world decision-making.

The real signs of AI writing are not as easy to fix as just instructing an LLM to “read” an article to avoid them.

As a teacher, all of my grading is now based on in person performances, no tech allowed. Good luck faking that with an LLM. I do not mind if students use an LLM to better prepare for class and exams. But my impression so far is that any other medium (e.g., books, youtube explanation videos) leads to better results.

Randelung · 1 day ago

I sucked in oral exams and therefore hated them. Then again, if they had been mixed into regular school, it might not have sucked so much.

prole@lemmy.blahaj.zone · 1 day ago

Doesn’t need to be oral, I remember occasionally having exams that were essay questions that needed to be answered in class.

udon · 1 day ago

I do both of these as well as smaller but more frequent tests, group work, project work over several sessions etc… The only things I stopped doing are reports to write at home, paper summaries etc. Doesn’t make sense anymore.