Why is no one talking about how unproductive it is to have verify every "hallucination" ChatGPT gives you?

@phoneymouse · 1 month ago

Why is no one talking about how unproductive it is to have verify every "hallucination" ChatGPT gives you?

archomrade [he/him] · 1 month ago

Idk why we have to keep re-hashing this debate about whether AI is a trustworthy source or summarizer of information when it’s clear that it isn’t - at least not often enough to justify this level of attention.

It’s not as valuable as the marketing suggests, but it does have some applications where it may be helpful, especially if given a conscious effort to direct it well. It’s better understood as a mild curiosity and a proof of concept for transformer-based machine learning that might eventually lead to something more profound down the road but certainly not as it exists now.

What is really un-compelling, though, is the constant stream of anecdotes about how easy it is to fool into errors. It’s like listening to an adult brag about tricking a kid into thinking chocolate milk comes from brown cows. It makes it seem like there’s some marketing battle being fought over public perception of its value as a product that’s completely detached from how anyone actually uses or understands it as a novel piece of software.

sp3ctr4l · edit-2 1 month ago

Probably it keeps getting rehashed because people who actually understand how computers work are extremely angry and horrified that basically every idiot executive believes the hype and then asks their underlings to inplement it, and will then blame them for doing what they asked them to do when it turns out their idea was really, unimaginably stupid, but idiot executive gets golden parachute and software person gets fired.

That, and/or the widespread proliferation of this bullshit is making stupid people more stupid, and just making more people stupid in general.

Or how like all the money and energy spent on this is actively murdering the environment and dooming the vast majority of our species, when it could be put toward building affordable housing or renovating crumbling infrastructure.

Don’t worry, if we keep throwing exponential increasing amounts of effort at the thing with exponentially diminishing returns, eventually it’ll become God!

archomrade [he/him] · 1 month ago

Then why are we talking about someone getting it to spew inaccuracies in order to prove a point, rather than the decision of marketing execs to proliferate its use for a million pointless implementations nobody wants at the expense of far higher energy usage?

Most people already know and understand that it’s bad at most of what execs are trying to push it as, it’s not a public-perception issue. We should be talking about how energy-expensive it is, and curbing its use on tasks where it isn’t anything more than an annoying gimmick. At this point, it’s not that people don’t understand its limitations, it’s that they don’t understand how much energy it’s costing and how it’s being shoved into everything we use without our noticing.

Somebody hopping onto openAI or Gemini to get help with a specific topic or task isn’t the problem. Why are we trading personal anecdotes about sporadic personal usage when the problem is systemic, not individualized?

people who actually understand how computers work

Bit idea for moderators: there should be a site or community-wide auto-mod rule that replaces this phrase with ‘eat all their vegitables’ or something that is equally un-serious and infantilizing as ‘understand how computers work’.

sp3ctr4l · edit-2 1 month ago

You original comment is posted under mine.

I am going to assume you are responding to that.

… I wasn’t trying to trick it.

I was trying to use it.

This is relevant to my more recent reply to you… because it is an anecdotal example of how broadly useless this technology is.

…

I wasn’t aware the purpose of this joke meme thread was to act as a policy workshop to determine an actionable media campaign aimed at generating mass awareness of the economic downsides of LLMs, which wouldn’t fucking work anyway because LLMs are being pushed by a class of wealthy people who do not fucking care what the masses think, and have essentially zero reason at all to change their course of action.

What, we’re going to boycott the entire tech industry?

Vote them out of office?

These people are on video, on record saying basically, ‘eh, we’re not gonna save the climate, not happening, might as well burn it all down even harder, even faster, for a tiny percentage chance our overcomplicated autocomplete algorithm magically figures out how to fix everything afterward’.

…

And yes, I very intentionally used the phrase ‘understand how computers actually work’ to infantilize and demean corporate executives.

Because they are narcissistic priveleged sociopaths who are almost never qualified, almost always make idiotic decisions that will only benefit themselves and an increasingly shrinking number of people at the expense of the vast majority of people who know more and work harder than they do, and who often respond like children having temper tantrums when they are justly criticized.

Again, in the context of a joke meme thread.

Please get off your high horse, or at least ride it over to a trough of water if you want a reasonable place to try to convince it to drink in the manner in which you prefer.

archomrade [he/him] · 1 month ago

… I wasn’t trying to trick it.

I was trying to use it.

Err, I’d describe your anecdote more as an attempt to reason with it…? If you were using google to search for an answer to something and it came up with the wrong thing, you wouldn’t then complain back to it about it being wrong, you’d just try again with different terms or move on to something else. If ‘using’ it for you is scolding it as if it’s an incompetent coworker, then maybe the problem isn’t the tool but how you’re trying to use it.

I wasn’t aware the purpose of this joke meme thread was to act as a policy workshop to determine an actionable media campaign

Lmao, it certainly isn’t. Then again, had you been responding with any discernible humor of your own I might not have had reason to take your comment seriously.

And yes, I very intentionally used the phrase ‘understand how computers actually work’ to infantilize and demean corporate executives.

Except your original comment wasn’t directed at corporate executives, it appears to be more of a personal review of the tool itself. Unless your boss was the one asking you to use Gemini? Either way, that phrase is used so much more often as self-aggrandizement and condescension that it’s hard to see it as anything else, especially when it follows an anecdote of that person trying to reason with a piece of software lmao.

sp3ctr4l · 1 month ago

It is not that it responded “Sorry, I cannot find anything like what you described, here are some things that are pretty close.”

It affirmatively said “No, no such things as you describe exist, here are some things that are pretty close.”

There’s a huge difference between a coworker saying “Dang man, I dunno, I can’t find a thing like that.” and “No, nothing like that exists, closest to it is x y z,”

The former is honest. The latter is confidently incorrect.

Combine that with “Wait what about gamma?”

And the former is still honest, and the latter, who now describes gamma in great detail and how it meets my requirements, is now an obvious liar, after telling me that nothing like that exists.

If I now know I am dealing with a dishonest interlocutor, now I am forced to consider tricking it into being homest.

Or, if I am less informed or more naive, I might just, you know, believe it the first time.

A standard search engine that is not formatted to resemble talking to a person does not prompt a user to expect it to act like a person, and thus does not suffer from this problem.

If you don’t find what you’re looking for, all that means is you did not find it.

If you are told that no such thing exists, a lot of people are going to believe that no such thing exists.

That is typically called spreading disinformation, when the actor knows what they are claiming is false.

Its worse than unhelpful, it actively spreads lies.

…

Anyway, I’m sorry that you don’t see humor in multi billion dollar technology failing at achieving its purported abilities, I laugh all the time at poorly designed products, systems, things.

…

Finally, I did not use the phrase in contention in my original post.

I used it in my response to you, specifically and only within a single sentence which revolved around incompetent executives.

…

It appears that reading comprehension is not your strong suit, maybe you can ask Gemini about how to improve it.

Err, well, maybe don’t do that.

archomrade [he/him] · 1 month ago

reading comprehension

Lmao, there should also be an automod rule for this phrase, too.

There’s a huge difference between a coworker saying […]

Lol, you’re still talking about it like it’s a person that can be reasoned with bud. It’s just a piece of software. If it doesn’t give you the response you want you can try using a different prompt, just like if google doesn’t find what you’re looking for you can change your search terms.

If people are gullible enough to take its responses as given (or scold it for not being capable of rational thought lmao) then that’s their problem - just like how people can take the first search result from google without scrutiny if they want to, too. There’s nothing especially problematic about the existence of an AI chatbot that hasn’t been addressed with the advent of every other information technology.

@[email protected] · 1 month ago

to fool into errors

tricking a kid

I’ve never tried to fool or trick AI with excessively complex questions. When I tried to test it (a few different models over some period of time - ChatGPT, Bing AI, Gemini) I asked stuff as simple as “what’s the etymology of this word in that language”, “what is [some phenomenon]”. The models still produced responses ranging from shoddy to absolutely ridiculous.

completely detached from how anyone actually uses

I’ve seen numerous people use it the same way I tested it, basically a Google search that you can talk with, with similarly shit results.

archomrade [he/him] · 1 month ago

Why do we expect a higher degree of trustworthiness from a novel LLM than we de from any given source or forum comment on the internet?

At what point do we stop hand-wringing over llms failing to meet some perceived level of accuracy and hold the people using it responsible for verifying the response themselves?

Theres a giant disclaimer on every one of these models that responses may contain errors or hallucinations, at this point I think it’s fair to blame the user for ignoring those warnings and not the models for not meeting some arbitrary standard.

@[email protected] · edit-2 30 days ago

Why do we expect a higher degree of trustworthiness from a novel LLM than we de from any given source or forum comment on the internet?

The stuff I’ve seen AI produce has sometimes been more wrong than anything a human could produce. And even if a human would produce it and post it on a forum, anyone with half a brain could respond with a correction. (E.g. claiming that an ordinary Slavic word is actually loaned from Latin.)

I certainly don’t expect any trustworthiness from LLMs, the problem is that people do expect it. You’re implicitly agreeing with my argument that it is not just that LLMs give problematic responses when tricked, but also when used as intended, as knowledgeable chatbots. There’s nothing “detached from actual usage” about that.

At what point do we stop hand-wringing over llms failing to meet some perceived level of accuracy and hold the people using it responsible for verifying the response themselves?

at this point I think it’s fair to blame the user for ignoring those warnings and not the models for not meeting some arbitrary standard

This is not an either-or situation, it doesn’t have to be formulated like this. Criticising LLMs which frequently produce garbage is in practice also directed at people who do use them. When someone on a forum says they asked GPT and paste its response, I will at the very least point out the general unreliability of LLMs, if not criticise the response itself (very easy if I’m somewhat knowledgeable about the field in question). This is practically also directed at the person who posted that, such as e.g. making them come off as naive and uncritical. (It is of course not meant as a real personal attack, but even a detached and objective criticism has a partly personal element to it.)

Still, the blame is on both. You claim that:

Theres a giant disclaimer on every one of these models that responses may contain errors or hallucinations

I don’t remember seeing them, but even if they were there, the general promotion and ways in which LLMs are presented in are trying to tell people otherwise. Some disclaimers are irrelevant for forming people’s opinions compared to the extensive media hype and marketing.

Anyway my point was merely that people do regularly misuse LLMs, and it’s not at all difficult to make them produce crap. The stuff about who should be blamed for the whole situation is probably not something we disagree about too much.

archomrade [he/him] · 29 days ago

The stuff I’ve seen AI produce has sometimes been more wrong than anything a human could produce. And even if a human would produce it and post it on a forum, anyone with half a brain could respond with a correction.

Seems like the problem is that you’re trying to use it for something it isn’t good or consistent at. It’s not a dictionary or encyclopedia, it’s a language model that happens to have some information embedded. It’s not built or designed to retrieve information from a knowledge bank, it’s just there to deconstruct and reconstruct language.

When someone on a forum says they asked GPT and paste its response, I will at the very least point out the general unreliability of LLMs, if not criticise the response itself (very easy if I’m somewhat knowledgeable about the field in question)

Same deal. Absolutely chastise them for using it in that way, because it’s not what it’s good for. But it’s a bit of a frequency bias to assume most people are using it in that way, because those people are the ones using it in the context of social media. Those who use it for more routine tasks aren’t taking responses straight from the model and posting it on lemmy, they’re using it for mundane things that aren’t being shared.

Anyway my point was merely that people do regularly misuse LLMs, and it’s not at all difficult to make them produce crap. The stuff about who should be blamed for the whole situation is probably not something we disagree about too much.

People misuse it because they think they can ask it questions as if it’s a person with real knowledge, or they are using it precisely for it’s convincing bullshit abilities. That’s why I said it’s like laughing at a child for giving a wrong answer or convincing them of a falsehood merely from passive suggestion - the problem isn’t that the kid is dumb, it’s that you’re (both yourself and the person using it) going in with the expectation that they are able to answer that question or distinguish fact from fiction at all.