So... I've been playing with LLMs and I've noticed something horrible...

The Bard in Green · edit-2 1 year ago

So... I've been playing with LLMs and I've noticed something horrible...

@kromem · 1 year ago

That literally ALL of the hate speech this multi billion parameter model was trained on was firmly rooted in a Christian worldview.

That’s not really what it tells us.

At best, it’s that the majority was associated with that context.

But even there, it might be less a direct association and more a secondary association. For example, it could have separately picked up the pattern of “rationalizations for harming people include appeals to religion” and then regressed to the mean when filling in the religion to be Christianity even if samples of rationalization for harm included Islamic or Hindu rationalizations in the training data.

One of the common misconceptions is that what it spits out is just surface statistics, which can sometimes be the case but often isn’t with much deeper network activity going on instead.

All that said, it wouldn’t be surprising to me at all if the majority of misogynistic, racist, or hateful speech samples in a training set were adjacent to content in line with neo-fascist Christian nationalism.

I just wouldn’t look at the output from a LLM as perfectly reflecting the entirety of the training set.