How close are we to "manually tuning" LLMs?

@gedaliyah · 2 days ago

How close are we to "manually tuning" LLMs?

@ndru · edit-2 2 days ago

I read a series of super interesting posts a few months back where someone was exploring the dimensional concept space in LLMs. The jump off point was the discovery of weird glitch tokens which would break GPTs, making them enter a tailspin of nonsense, but the author presented a really interesting deep dive into how concepts are clustered dimensionally, presenting some fascinating examples and, for me at least, explained in a very accessible manner. I don’t know if being able to identify those conceptual clusters of weights means we’re anywhere close to being able to manually tune them, but the series is well worth a read for the curious. There’s also a YouTube series which really dives into the nitty gritty of LLMs, much of which goes over my head, but helped me understand at least the outlines of how the magic happens.

(Excuse any confused terminology here, my knowledge level is interested amateur!)

Posts on glitch tokens and exploring how an LLM encodes concepts in multidimensional space. https://www.lesswrong.com/posts/8viQEp8KBg2QSW4Yc/solidgoldmagikarp-iii-glitch-token-archaeology

YouTube series is by 3Blue1Brown - https://m.youtube.com/@3blue1brown

This one is particularly relevant - https://m.youtube.com/watch?v=9-Jl0dxWQs8