To be clear I’m not expert. But I know a bit.
The way LLMs (like ChatGPT, GPT-4, etc) work, is that they continuously decide what the next best-sounding word might be, and they print it, over and over and over, until it makes sentences and paragraphs. And the way that next-word decision works under the hood, is with a deep neural net that was initially a theoretical tool designed to imitate the neural circuits that make up our biological nervous system and brain. The actual code for LLMs is rather small, it’s just about storing and managing representations of a neuron, and rearranging the connections between neurons as it learns more; just like the brain does.
I was listening to the first part of this “This American Life” episode this morning that covers it really well: https://podcasts.apple.com/us/podcast/this-american-life/id201671138?i=1000618286089 In it, Microsoft AI experts also express excitement and confusion about how GPT-4 seems to actually reason about things, rather than just bullshitting the next word to make it look like it reasons, like it’s supposed to be designed to do.
And so I was thinking: the reason why it works might be the other way around. It’s not that LLMs are smart enough to reason instead of bullshit, it’s that human’s reasoning actually works out of constantly bullshitting too, one word at a time. Imitate the human brain exactly, and I guess we shouldn’t be surprised that we land with a familiar-looking kind of intelligence - or lack thereof. Right?
I can’t find the video but basically GPT can only predict the next word it wants to say, or something like that… so things like making a joke with a funny punchline is almost impossible because GPT doesn’t consider what the punchline will be until after the setup has be generated.
This inability to plan ahead results in failure of extremely simple tasks like, shortening a sentence to exactly 10 words
The original video I saw was doing something with lists I think. But yea this inability to think even 2 words ahead imbues some serious limitations.
I don’t know if your source is using ChatGPT or GPT-4, but the podcast I linked above is about AI researchers who knew very well about ChatGPT’s limitations and therefore were of the strong opinion that LLMs couldn’t reason, and then GPT-4 came out which changed their entire thinking about LLMs. They tried to come up with experiments that could reliably identify that some reasoning happened or not, and GPT-4 didn’t pass all of them, but it passed a bunch more than ChatGPT could, and a bunch more than it should. And they have no idea how exactly it would succeed at those, if the thing really can’t reason.
You’re correct that LLMs only guess the next best word, over and over. Which makes it even more mysterious why some LLMs pass those tests you just mentioned. They really shouldn’t, theoretically.
Unless… (and that’s my point) we also shouldn’t theoretically, but there is something about neural designs and bullshitting things one word at a time that makes those specific reasonings work, and that would explain why both LLMs and humans succeed at some of them.