Deepseek Tianenmen square controversy gets weirder

@[email protected] · edit-2 17 hours ago

Deepseek Tianenmen square controversy gets weirder

@TropicalDingdong · 21 hours ago

From a technical perspective, it seems like a pretty clear case of model over-fitting to me. I’ve not dug too deep into this recent advancement, but this might be something more specific to how this recent model was trained.

Were you running this on local hardware?

@[email protected] · edit-2 17 hours ago

All run on local hardware. Llama models do it too. This is 8b

@[email protected] · 21 hours ago

Heh the entire post is about running the self hosted versions.

@TropicalDingdong · 20 hours ago

Right on. Yeah, so that would also point in the direction of extreme over-fitting during training. You might check out this notebook on abliteration if you want to learn more: https://colab.research.google.com/drive/1VYm3hOcvCpbGiqKZb141gJwjdmmCcVpR

My guess is you could confirm this is the case by looking at the high dimensional orthogonality of a given instruction that is refused, and then again some which are not refused. You could probably just try the two different spellings of Tianemen and Tianamen be able to key in on the censored response vector.