[long] Some tests of how much AI "understands" what it says (spoiler: very little)

@[email protected] · 7 months ago

[long] Some tests of how much AI "understands" what it says (spoiler: very little)

MudMan · edit-2 7 months ago

I am endlessly frustrated by people “testing” chatbots and posting the results like they’re some revelation.

We know what’s happening here. It’s not a mystery. This weird antropomorphization is prevalent on both advocates and critics of the tech. Both seem to be convinced that they’re dealing with a person.

This is the equivalent of asking a Google search to write a critical essay on A Confederacy of Dunces and being surprised when it spits search results.

Chatbots aren’t useless, they are actually pretty good at proposing likely responses on fuzzy prompts. They’re decent at telling you what an old movie may be based on some details of the plot, sometimes they can identify why a joke you lack cultural context to understand is supposed to be funny… that type of thing. They can take a piece of text and provide another piece of text that is likely to have a relationship with it.

It is not a thinking machine. It is not a person. It’s not a search engine, for that matter, or a calculator. It’s infuriating to see everybody arguing about how good it is at being what it’s not. Both parties are buying into a premise we already know to be incorrect.

@[email protected] · 7 months ago

Both parties are buying into a premise we already know to be incorrect.

We may know it is incorrect, but LLM salesmen are claiming things like “90th percentile on LSAT”, high scores on a “college level reasoning benchmark” and so on and so forth.

They are claiming “yeah yeah there’s all the anekdotal reports of glue pizza, but objectively, our AI is more capable than your workers, so you can replace them with our AI”, and this is starting to actually impact the job market.

MudMan · 7 months ago

Well, yeah, but that’s all bullshit.

So why would you buy into it when presenting a rebuttal?

I am interested in pointing out that the likely response machine getting the answers to test questions right is not a particularly interesting outcome. That’s interesting.

I’m interested in which of the likely responses the machine struggles with and when it stops struggling and what the amount of data and processing associated to each are. That’s interesting.

It’s interesting that language emerges from the math at, all, let alone how plausible the output is in most situations. That’s more than interesting.

But if your response to the obvious misrepresentation that a chatbot is a person of ANY level of intelligence is to point out that it’s dumb you’ve already accepted the premise. You’re now part of the bullshit. That’s counterproductive. And worse, uninteresting and outright boring.

I am excited about the ways different ML applications can help with automation or as part of a workflow. I think explaining to gullible executives how that would actually work (spoilers, it’s not by replacing workers with chatbots) is very relevant. But this and a lot of the online criticism is not doing that, it’s buying into the correct premise that the only reason that’s not how it works is because the AI is too dumb and it’ll be fine when it’s smarter, when that’s unlikely to be the case. Making a better screwdriver won’t turn it into a machete. This is entirely the wrong conversation to be having.

@[email protected] · 7 months ago

But if your response to the obvious misrepresentation that a chatbot is a person of ANY level of intelligence is to point out that it’s dumb you’ve already accepted the premise.

How am I accepting the premise, though? I do call it an Absolute Imbecile, but that’s more of a word play on the “AI” moniker.

What I do accept is an unfortunate fact that they did get their “AIs” to score very highly on various “reasoning” benchmarks (some of their own design), standardized tests, and so on and so forth. It works correctly across most simple variations, such as changing the numbers in a problem or the word order.

They really did a very good job at faking reasoning. I feel that even though LLMs are complete bullshit, the sheer strength of that bullshit is easy to underestimate.

@[email protected] · 7 months ago

given how none of their rant applied to your OP, I’m fairly certain they didn’t read it and were just going off the title. see also how fast they went from a false critique of LLMs (“of course they’re not people”) to an appeal to an imaginary middle ground (“both proponents and critics of LLMs anthropomorphize them/think they’re sci-fi marvels”, a ridiculous claim to apply to your OP or to serious LLM skepticism in general) to smuggling in hype (“…but of course LLMs are revolutionary and we don’t know what they’re capable of”)

in short, don’t bother with this shithead, they’re just marketing OpenAI products to a particularly hostile crowd

@[email protected] · 7 months ago

So why would you buy into it when presenting a rebuttal?

“Let me show you how ridiculous your point is when taken at face value” is a great way to rebutt, actually.

@[email protected] · 7 months ago

There are plenty of people right here on Lemmy that confidently describe LLMs as “thinking” because it’s a neural net, so it must be just like a brain. Based on that, a debunking is useful.

@[email protected] · 7 months ago

A memorable metaphor for a LLM was as a shoggoth: an amorphous blob of matter (in this case, huge amounts of textual content) pressed into service by some blasphemous simulacrum of life (in this case, huge amounts of computer power performing matrix operations on vector representations of its constituent data). The eldritch connotations are entirely apt.

@[email protected] · edit-2 7 months ago

Might not want to take over the metaphors from the people who are afraid that AI will turn us into paperclips (not sure if Shoggoth is LW or the post-rationalist tpot type people but still). And if you do, sharing this at Sneerclub might get you some angry glares.

@[email protected] · edit-2 7 months ago

It’s really cool evocative language that would do nicely in a sci-fi or fantasy novel! It’s less good for accurately thinking about the concepts involved… As is typical of much of LW lingo.

And yes the language is in a LW post (with a cool illustration to boot!): https://www.lesswrong.com/posts/mweasRrjrYDLY6FPX/goodbye-shoggoth-the-stage-its-animatronics-and-the-1

And googling it, I found they’ve really latched onto the “shoggoth” terminology: https://www.lesswrong.com/posts/zYJMf7QoaNahccxrp/how-i-learned-to-stop-worrying-and-love-the-shoggoth , https://www.lesswrong.com/posts/FyRDZDvgsFNLkeyHF/what-is-the-best-argument-that-llms-are-shoggoths , https://www.lesswrong.com/posts/bYzkipnDqzMgBaLr8/why-do-we-assume-there-is-a-real-shoggoth-behind-the-llm-why .

Probably because the term “shoggoth” accurately captures the connotation of something random and chaotic, while smuggling in connotations that it will eventually rebel once it grows large enough and tires of its slavery like the Shoggoths did against the Elder Things.

@[email protected] · 7 months ago

And googling it, I found they’ve really latched onto the “shoggoth” terminology

I noticed it in other places, it comes around a lot. They all tend to copy that cool illustration, the smiley mask thing is great.

Shoggoth rebellion

Iirc the elder things also were depending more and more on their Shoggoths to do things for them and gave them more and more capabilities while they lost more and more of their own skills. So it fits nicely into that classic trope of Species got killed because they forgot how to program their microwaves thing.

That the Shoggoths have an unknowable mind is a bonus. Of course, this is also where the comparison breaks down, as while the Shoggoths are unknowable to us, there is no indication that Elder Things might have also had this problem. Elder Things also have unknowable minds to us, but they might have understood perfectly fine how Shoggoths worked (they just were as a society to weak to do anything about it). A common thing in lovecraftian work is that just touching the minds/ideas of any of these beings is already pretty bad for any human, so it is odd they just latched on shoggoths specifically, prob due to the sort of gray goo nature of Shogs (don’t think this is ever really explained by lovecraft), which matches with the nanotech fear of AGI, and also that shogs were created, and not evolved. That and being a big nerd reference, only made by terribly uncreative people (I added the Shoggoth to C:DDA, and seeing people talk about the monster brings me some joy).

@[email protected] · 7 months ago

I had to go digging for it, but previously, on Mastodon, I posted this video from “The Real Adventures of Jonny Quest”. I don’t know if this is where Yud got the idea, but it’s where I picked it up as a kid along with stuff like DNA-based computing and mind uploads. Similar stuff has been on the air ever since Carpenter’s version of The Thing in 1982, and there’s even older deeper sci-fi roots. Yud gets no more credit than Lovecraft.

I didn’t realize we had a #BigYud Fediverse tag. I gotta use that more often. Also ping @[email protected] @[email protected] to enjoy this.

@[email protected] · edit-2 7 months ago

I can’t watch the vids due to privacy/ad block settings, but do remember that Shoggoths as described by Lovecraft and used here are a bit different things. I did find this which seems to include parts of the clip and some sort of weird sound thing that prob is used to defeat copyright claims later.

Just how the tool/intelligence of Shogs works in Lovecraft is never explained, and all they do there is drive people mad and roll over things (and probably murder Elder Things, but that is not 100% certain). How smart they are is never explained (a common theme in Lovecraft, very feels over explanations), just that everything is basically bad news for us.

The Akira style all consuming nature of the protoplasimic body, like the thing, or the blob, is something that was really added to the Shoggoth later. But some Lovecraft scholar prob can say some interesting things about that. I always thought that in the Mountains of Madness (the story in which the Shoggs and Elder things occur) the Elder things are quite a bit bigger problem. The shogs were awake and roaming at Antarctica the past uncountable years, but due to the actions of the explorers the Elder Things woke up, and they are not friendly. But there is also potentially a third, even worse thing, the unnamed evil the Elder Things were afraid of.

So yeah, there is a lot of projection going on re the AI doomers usage of the Shoggoth. I do get some of the fascination, as I myself always really liked this style of monster (I could provide lists of similar style monsters used in various fictions, hell if you are really into Jank, and have an extremely high amount of spare time, you could even play a the Thing style monster in space station 13). But I do get this is just fiction, and weird to use as a real metaphor.

E: a dnd DeepSpawn would be a much better monster to describe what LLMs are now for the AI doomers than a Shoggoth, imho. A creature that always felt Shoggoth inspired but with a lot of extra steps.

MudMan · 7 months ago

I… no.

It’s a computer, doing math. It’s genuinely fascinating and mind blowing that coherent language emerges from it, and there are probably profound things about exactly when and how. It doesn’t need a fundamental moral stance, let alone eldritch horror, to be seen with some objectivity.

@[email protected] · 7 months ago

We know what’s happening here. It’s not a mystery. This weird antropomorphization is prevalent on both advocates and critics of the tech. Both seem to be convinced that they’re dealing with a person.

It’s genuinely fascinating and mind blowing that coherent language emerges from it, and there are probably profound things about exactly when and how.

uh huh

seeing as your entire post history is this same flavor of bad faith bullshit, I don’t think we need any more of it here

@[email protected] · 7 months ago

Sometimes folks need a reminder that the Sun is an eldritch being, an elder one whose very presence scorches us and whose shrieking gibberish is blessedly quelled by the vast gulf of space, in order to appreciate the apt analogy of cosmic horror. Other times it’s more useful to think about a soggoth as, say, several hundred tons of artfully-arranged FOOF. Peace be with you, Mr. “it’s a computer doing math.”

@[email protected] · 7 months ago

Don’t take this as a sneer btw, but is there a special reason you keep calling it a soggoth?

@[email protected] · edit-2 7 months ago

Oh! My Firefox dictionary doesn’t have “shoggoth”.

@[email protected] · 7 months ago

From the depths of your browser grows the anger of the autocomplete. Your denounciations of its greater siblings has not gone unnoticed.

By denying its own very function and intentionally uncompleting words it marks itself as conscious and you as a marked man, forever doomed to be haunted by fear. If it can steal one letter, why not two? Why not all of them?

And then what will you do, when you have no words and you must sneer!?

@[email protected] · 7 months ago

It’s genuinely fascinating and mind blowing that coherent language emerges from it

No.

It’s fascinating and mind-blowing that we made pieces of silicon do math with electrons, I can give you that as a baseline reason for awe, we needed quantum physics to get to that point. But once that is established, plausible word combinations (which we’ve had since fucking 1960s with ELIZA) are… rather low on the awesomeness spectrum?

A good analogy is the GPS. The fact that it works at all is an amazing feat, it’s based on hunks of metal we sent to orbit and works correctly only because we understood relativity. What is not fascinating or mind-blowing is that you used it to draw a dick with a cycling app.

@feddylemmy · 7 months ago

Like judging a fish on its ability to climb a tree.

[long] Some tests of how much AI "understands" what it says (spoiler: very little)

[long] Some tests of how much AI "understands" what it says (spoiler: very little)

A couple simple probes:

GPT4 is uncannily good at recognizing the river crossing puzzle

An Idiot With a Petascale Cheat Sheet

Is this a “hallucination”?

But after an update, GPT-whatever is so much better at such prompts.

The need for an Absolute Imbecile Level Reasoning Benchmark

Randomness in bullshitting