In the light of Snowden's latest post: What are your FOSS-AIs?

@[email protected] · 9 months ago

In the light of Snowden's latest post: What are your FOSS-AIs?

@[email protected] · 9 months ago

Is abliteration based off the research by the Anthropic team? When they got Claude to say it was the golden gate bridge?

FaceDeer · 9 months ago

Ironically, as far as I’m aware it’s based off of research done by some AI decelerationists over on the alignment forum who wanted to show how “unsafe” open models were in the hopes that there’d be regulation imposed to prevent companies from distributing them. They demonstrated that the “refusals” trained into LLMs could be removed with this method, allowing it to answer questions they considered scary.

The open LLM community responded by going “coooool!” And adapting the technique as a general tool for “training” models in various other ways.