So... I've been playing with LLMs and I've noticed something horrible...

The Bard in Green · edit-2 1 year ago

So... I've been playing with LLMs and I've noticed something horrible...

@NounsAndWords · 1 year ago

Yes and no. Once the first response includes “according to the Bible” or similar it’s going to keep answering in a similar pattern. A better version of this experiment would be to start a new session for every question. Maybe even try asking it to make a ranked list of reasons to do X. You would want to use the most neutral language possible, regenerate the response a few times, and ask in a few different ways. Depending on what you’re using I would suggest dropping the temperature to 0.

Also, its giving you the most likely next words based on your question. You picked a bunch of things that are (or were) very commonly defended with the Bible, along with apparently asking directly about atheists at which point I would be surprised if religion wasn’t included in the response.

ALSO, if you ask it to defend something awful, I think the “best” reasoning would rely on an outside objective morality for why it’s okay (like religion).