LLMs are built by generating a network of weights based on a large volume of training data. Some models have made those weights public/open, meaning you could, in principle, go in and manually edit the weights individually to change the outcomes. In practice, you would never do this because it would only ruin the output.
However, you could theoretically nudge a lot of values in just the right way to change the model to favor an ideology, have a different attitude, produce disinformation etc.
Right now, this is done practically in a brute force manner. The program will have certain instructions and parameters appended to the input in order to force a certain disposition, limit the scope, etc.
There are a lot of reasons to want to adjust the fundamentals of a model, but AFAIK such a technology doesn’t exist yet (publicly). For example, this could be used for political gain, or for positive purposes like removing racism that has been well documented.
Is anyone working on such a thing?
Note: This community is “no stupid questions,” but I am actually pretty stupid and I probably misunderstood some (all) of the fundamentals of how this works. Please respond to any part of my question.
With an ai model u can do what’s called finetuning which is essentially training a pretrained model on a specific set of data to tweak the weights in the desired direction. There are multiple use cases for thus currently ie coding/specific language expert models, dolphin models for uncensored models, roleplaying finetunings etc etc.
We still have very little knowledge on how and what the weights in a model do. So manually tweaking them is unreasonable. There is lots of work related to trying to decode the meaning/purpose of specific neuron or group of neurons and if me manually boost/suppress it it will change the output to reflect as such.