One of the most aggravating things to me in this world has to be the absolutely rampant anti-intellectualism that dominates so many conversations and debates, and its influence just seems to be expanding. Do you think there will ever actually be a time when this ends? I'd hope so once people become more educated and cultural changes eventually happen, but as of now it honestly infuriates me like few things ever have.
Not quite.
If you’re actually interested in the topic, I recommend searching for the writeup on Othello GPT from the Harvard/MIT researchers earlier this year.
While the topic of ‘consciousness’ is ridiculous and honestly a red herring (even in neuroscience it’s outside the scope of the science), the question of whether models have developed specialized ‘awareness’ through training is pretty much a closed topic at this point given about a half dozen studies. There was an interesting approach from Anthropic just the other day that’s probably going to be very promising in looking more at features as an introspection unit over individual nodes (i.e. sets of nodes that fire when it is fed DNA sequences), and I expect over the next 12 months the “it’s just statistics” is going to be put to bed once and for all.
While yes, it develops world views and specialized subnetworks based on the training data, things like the concept of self and identity are pretty broadly represented in human writing, don’t you think?
So if we already know for certain a simple toy model fed only legal board game moves builds a dedicated part of its network for internal board representation and tracking of board state, just how certain are you that an exponentially more complex model fed effectively the entire Internet doesn’t have parts of that resulting network dedicated to modeling ego and self-reference?
Also, FYI no one ‘debugs’ model weights. It’s like solving a billion variable algebra equation, and the best we can do at the moment is very loose introspection of toy models we hope are effective approximations of the larger ones - direct manipulation of nodes in process to evaluate effects (i.e. debugging) is effectively a non-starter.