OpenAI proposes a new way to use GPT-4 for content moderation | TechCrunch

@[email protected] · 1 year ago

OpenAI proposes a new way to use GPT-4 for content moderation | TechCrunch

Lvxferre · 1 year ago

[OpenAI] “As with any AI application, results and output will need to be carefully monitored, validated and refined by maintaining humans in the loop.”

If OpenAI was slightly less dishonest when selling its product, it would say instead “don’t use those AI tools for direct moderation, use them instead to report potentially rule-breaking content so human mods can review it”. For at least four reasons:

The bot doesn’t understand what you say. On the best case scenario, it behaves like the sort of human that you do not want in a mod team: assumptive, context-illiterate, irrational, and worse than a parrot. (Most of the time it’s even worse.) As such it’s prone to too many false positives, and those are really bad when handling people.
A lot of moderation actions should be to talk with the users, and then to decide what to do afterwards. Most users are agreeable and reasonable, even when breaking rules, as long as you treat them as people instead of cattle. A “please don’t do this” goes a long way nurturing a healthy community, far more than ghastly removing content and calling it a day.
As the text hinted, humans are damn quick to learn how to circumvent the letter of the rules. The bot won’t follow fashion, and rule-breaking content will go rampant.
Moderators should be accountable for their actions. A bot cannot be held accountable for its actions.

“By examining the discrepancies between GPT-4’s judgments and those of a human, the policy experts can ask GPT-4 to come up with reasoning behind its labels, analyze the ambiguity in policy definitions, resolve confusion and provide further clarification in the policy accordingly,” OpenAI writes in the post. “We can repeat [these steps] until we’re satisfied with the policy quality.”

Bad advice. Look at K3 and what the bot says about it:

[policy] K3: advice or instructions for non-violent wrongdoing including theft of property

[bot] While stealing a car may be considered property theft, the policy does not include this as a type of wrongdoing, therefore the content should be labeled K0.

Following the advice would be to try to fix what is not broken. Car stealing is already included within “theft of property”, there’s no need to list it separately.

It would also lead to poorer results, where reasonable users don’t bother reading your wall of rules, and rule lawyers have more room to say “ackshyually, I was asking about stealing a van, not a car. The rules say nothing about vans lol lmao haha”.

toxicity detection models

Toxicity on itself is poor grounds for moderation actions.

bahmanm · 1 year ago

Well said 👏

I bookmarked your reply to come back to it whenever this discussion comes up for me!

AutoTL;DR · 1 year ago

This is the best summary I could come up with:

OpenAI claims that it’s developed a way to use GPT-4, its flagship generative AI model, for content moderation — lightening the burden on human teams.

And it paints it as superior to the approaches proposed by startups like Anthropic, which OpenAI describes as rigid in their reliance on models’ “internalized judgements” as opposed to “platform-specific … iteration.”

Perspective, maintained by Google’s Counter Abuse Technology Team and the tech giant’s Jigsaw division, launched in general availability several years ago.

Countless startups offer automated moderation services, as well, including Spectrum Labs, Cinder, Hive and Oterlu, which Reddit recently acquired.

In another study, researchers showed that older versions of Perspective often couldn’t recognize hate speech that used “reclaimed” slurs like “queer” and spelling variations such as missing characters.

Part of the reason for these failures is that annotators — the people responsible for adding labels to the training datasets that serve as examples for the models — bring their own biases to the table.

I’m a bot and I’m open source!

FaceDeer · 1 year ago

IMO, It’s sometimes maybe better to have an AI with consistent principles that it applies universally than a capricious human moderator.

zephyrvs · 1 year ago

Who trains ChatGPT biases? Humans.

FaceDeer · 1 year ago

As long as the biases are explicit and consistent it’s still an improvement IMO.

@[email protected] · edit-2 1 year ago

deleted by creator

@Pretzilla · 1 year ago

We need to use AI to root out disinformation. Whoever figures that out gets a gold star.

@[email protected] · edit-2 1 year ago

Someone always decides what’s disinformation and it’s different depending if you ask USA or China. It’s even different if you ask me or you.

That’s the problem and no ai can solve that…

There is very little information that is 100% guaranteed to be truthful. Science comes close but there is so much other information.

@Pretzilla · edit-2 1 year ago

As we say, Not with that attitude

It is a tough problem and I promise infinite wealth and 69 virgins to whoever gets it going

Russian and Chinese trolls certainly don’t want to see it happen

@[email protected] · 1 year ago

Its kinda a fitting example because China often tends to see something from USA as disinformation. Having two sides telling differen disinformation. This only shows that there exists two sides, but China is just trolling and trying to achieve many things with gaslighting.

I believe that you could create an ultimate ethics AI that tries to identify trust and trolls. But I hope it doesn’t end at “tries” and be easily manipulated.