Reversal knowledge in this case being, if the LLM knows that A is B, does it also know that B is A, and apparently the answer is pretty resoundingly no! I’d be curious to see if some CoT affected the results at all
Meh. Either I’m doing something wrong. Or we should stop linking (only) twitter posts. I can only see the original 42 words and a picture. No mentioned paper or thread that clarifies what this means.
For other people with the same problem, here’s the website of the person: https://owainevans.github.io/
And here’s the mentioned paper: https://owainevans.github.io/reversal_curse.pdf
Yeah fair point, I’ll make sure to include better links in the future :) typically post from mobile so it’s annoying but doable
thank you
While I’m not totally caught up in LLM magic, as far as I am aware, all LLMs are doing is (heavily simplified) very fancy auto correct/text prediction. LLMs don’t “know” anything. They aren’t equating anything, the aren’t “learning” the way people learn by association or relating a word to an object or idea. So from my understanding your assumption that.
LLM knows that A is B
already doesn’t make sense in the context of current LLMs. Lots of people have made posts about A is B. So text prediction is saying when A, high probability that then B. Its not pulling from a base of knowledge, then constructing an answer to a question based on that pool of knowledge. It’s finding relationships between character groups. Similar to Libre Office, it can check your grammar based on known patterns, but a LLM can use the greater context of its training data to find larger and larger patterns of character groups.
I’m not a computer scientist, but from my understanding LLMs are widely misunderstood. People talk so often about how they “hallucinate”, or that they are “inaccurate”, but I think those discussions are totally irrelevant in the long term. Have you ever considered that your phone’s text completion is lying? What does that even mean, for auto-correct to lie? It doesn’t know anything its just guessing the next letters/words given the words written so far. That’s all LLMs are doing too, just significantly more sophisticated. So I have never once ever considered anything produced by a LLM as true or false, because it cannot possibly do that.
To start, everything you’re saying is entirely correct
However, the existence of emergent behaviours like chain of thought reasoning shows that there’s more to this than pure text predictions, it picks up patterns that were never explicitly trained, so it’s entirely feasible to ponder if they’re able to recognize reverse patterns
Hallucinations are a vital part of understanding the models, they might not be long term problems but getting them to understand what they actually know to be true is extremely important in the growth and adoption of LLMs
I think there’s a lot more to the training and generation of text than you’re giving it credit, the simplest way to explain it is that it’s text prediction, but there’s way too much depth to the training and model to say that’s all it is
At the end of the day it’s just a fun thought inducing post :) but when Andrej karparthy says he doesn’t have a great intuition on how LLM knowledge works (though in fairness he theorizes the same as you, directional learning) I think we can at least agree none of us know for sure what is correct!
So I have never once ever considered anything produced by a LLM as true or false, because it cannot possibly do that.
You’re looking at this in an overly literal way. It’s kind of like if you said:
Actually, your program cannot possibly have a “bug”. Programs are digital information, so it’s ridiculous to suggest that an insect could be inside! That’s clearly impossible.
“Bug”, “hallucination”, “lying”, etc are just convenient ways to refer to things. You don’t have to interpret them as the literal meaning of the word. It also doesn’t require anything as sophisticated as a LLM for something like a program to “lie”. Just for example, I could write a program that logs some status information. It could log that everything is fine and then immediately crash: clearly everything isn’t actually fine. I might say something about the program being “lying”, but this is just a way to refer to the way that what it’s reporting doesn’t correspond with what is factually true.
People talk so often about how they “hallucinate”, or that they are “inaccurate”, but I think those discussions are totally irrelevant in the long term.
It’s actually extremely relevant in terms of putting LLMs to practical use, something people are already doing. Even when talking about plain old text completion for something like a phone keyboard, it’s obviously relevant if the completions it suggests are accurate.
So text prediction is saying when A, high probability that then B.
This is effectively the same as “knowing” A implies B. If you get down to it, human brains don’t really “know” anything either. It’s just a bunch of neurons connected up, maybe reaching a potential and firing, maybe not, etc.
(I wouldn’t claim to be an expert on this subject but I am reasonably well informed. I’ve written my own implementation of LLM inference and contributed to other AI-related projects as well, you can verify that with the GitHub link in my profile.)
That’s a logical fallacy. Given A is B it does not follow that B is A.
edit: it would make sense if it was phrased as “A is equivalent to B”. Saying “A is B” in a scientific context has a very specific meaning. Makes me wonder how trustworthy the paper itself is.
I’m not really sure I follow, it’s just a simplification, the most appropriate phrasing I guess would be “given A belongs to B, does it know B ‘owns’ A” like the examples given with “A is the son of B, is B the parent of A”
Looks like the findings are specifically about out-of-context learning, i.e. fine-tuning on facts like “Tom Cruise’s mother was Mary Lee Pfeiffer” is not enough to be able to answer a questions like “Who are the children of Mary Lee Pfeiffer?”, without any prompt engineering/tuning.
However, if you have in the context something like “Who was Tom Cruise’s mother?”, then the LLM has no problem answering correctly “Who are the children of Mary Lee Pfeiffer?”, listing all the children, including Tom Cruise.
Note that it would be confusing even to a human to ask “Who is the son of Mary Lee Pfeiffer?”, which is what they test on, since the lady had more than one son. That was the point of my comment, it’s just a misleading question.
But that’s not the issue in general that the researchers have unearthed, as I assumed based on the “A is B” summary, so yeah, it’s just a poor choice of wording.
Spaghetti is pasta, but pasta is not (necessarily) spaghetti, right?