- cross-posted to:
- aicompanions
- cross-posted to:
- aicompanions
Tech experts are starting to doubt that ChatGPT and A.I. ‘hallucinations’ will ever go away: ‘This isn’t fixable’::Experts are starting to doubt it, and even OpenAI CEO Sam Altman is a bit stumped.
No they do, thats one of the key innovations of LLMs the attention and feed forward steps where they propagate information from related words into each other based on context. from https://www.understandingai.org/p/large-language-models-explained-with?r=cfv1p
That’s exactly what I said
The word’s meanings haven’t changed, but the model can choose based on the context accounting for the different meanings of words
This is the bit you are missing, the attention network actively changes the token vectors depending on context, this is transferring new information into the meanings of that word.
The network doesn’t detect matches, but the model definitely works on similarities. Words are mapped in a hyperspace, with the idea that that space can mathematically retain conceptual similarity as spatial representation.
Words are transformed in a mathematical representation that is able (or at least tries) to retain semantic information of words.
But different meanings of the different words belongs to the words themselves and are defined by the language, model cannot modify them.
Anyway we are talking about details here. We could kill the audience of boredom
Edit. I asked gpt-4 to summarize the concepts. I believe it did a decent job. I hope it helps:
Embedding Space:
Positional Encodings:
Transformations Through Layers:
Nature of the Vector Space:
Output Space:
In essence, the entire process of token representation within the Transformer model can be seen as continuous transformations within a vector space. The space itself can be considered a learned representation where relative positions and directions hold semantic and syntactic significance. The model’s training process essentially shapes this space in a way that facilitates accurate and coherent language understanding and generation.
Yes of course it works on similarities, I havent disputed that. My point was that the transformations of the token vectors are a transfer of information, and that this transfer of information is not lost as things move out of the context length. That information may slowly decohere over time if it is not reinforced, but the model does not immediately forget things as they move out of context as you originally claimed.
It does, as model only works with a well defined chunk of tokens of a given length. Everything before is lost. Clearly part of the information of previous context is in that chunk.
But let’s say that I am talking about wine, at some point I talk about chianti. I and the chatbot go on discussing for over 4k words (I am using chatgpt as an example) without mentioning chianti. After that the chatbot will know we are discussing about wine, but it won’t know we covered the topic of chianti.
This is what I meant.
I’m only going to reply this time then I’m done here as we are going round in circles. I’m saying that is not what happens as the attention network would link Chianti and wine together in that case and move information between them. So even after Chianti has gone out of the context window it is more likely to pick Chianti than Merlot when it requires a type of wine.
Good call, it doesn’t look like wr are convincing each other ;)