“Since the quality of AI deception and the ways you can do it keeps improving and shifting this is an important element to keep the policy dynamic as AI and usage gets more pervasive or more deceptive, or people get more accustomed to it,” Mr Gregory said.
He added focusing on labelling fake posts would be an effective solution for some content, such as videos which have been recycled or recirculated from a previous event, but he was sceptical about the effectiveness of automatically labelling content manipulated using emerging AI tools.
I’m not a social scientist, but it’s a mixed bag. Here are the top results from Google Scholar:
There is growing concern over the spread of misinformation online. One widely adopted intervention by platforms for addressing falsehoods is applying “warning labels” to posts deemed inaccurate by fact-checkers. Despite a rich literature on correcting misinformation after exposure, much less work has examined the effectiveness of warning labels presented concurrent with exposure. Promisingly, existing research suggests that warning labels effectively reduce belief and spread of misinformation. The size of these beneficial effects depends on how the labels are implemented and the characteristics of the content being labeled. Despite some individual differences, recent evidence indicates that warning labels are generally effective across party lines and other demographic characteristics.
Social media platforms face rampant misinformation spread through multimedia posts shared in highly-personalized contexts [10, 11]. Foundational qualitative research is necessary to ensure platforms’ misinformation interventions are aligned with users’ needs and understanding of information in their own contexts, across platforms. In two studies, we combined in-depth interviews (n=15) with diary and co-design methods (n=23) to investigate how a mix of Americans exposed to misinformation during COVID-19 understand their information environments, including encounters with interventions such as Facebook fact-checking labels. Analysis reveals a deep division in user attitudes about platform labeling interventions, perceived by 7/15 interview participants as biased and punitive. As a result, we argue for the need to better research the unintended consequences of labeling interventions on factual beliefs and attitudes.
These findings also complicate discussion around “the backfire effect”, the idea that when a claim aligns with someone’s ideological beliefs, telling them that it’s wrong will actually make them believe it even more strongly [35]. Though this phenomenon is thought to be rare, our findings suggest that emotionally-charged, defensive backfire reactions may be common in practice for American social media users encountering corrections on social media posts about news topics. While our sample size was too small to definitively measure whether the labels actually strengthened beliefs in inaccurate claims, at the very least, reactions described above showed doubt and distrust toward the credibility of labels–often with reason, as in the case of “false positive” automated application of labels in inappropriate contexts.
In the case of state-controlled media outlets on YouTube, Facebook, and Twitter this has taken the form of labeling their connection to a state. We show that these labels have the ability to mitigate the effects of viewing election misinformation from the Russian media channel RT. However, this is only the case when the platform prominently places the label so as not to be missed by users.
Using appropriate statistical tools, we find that, overall, label placement did not change the propensity of users to share and engage with labeled content, but the falsity of content did. However, we show that the presence of textual overlap in labels did reduce user interactions, while stronger rebuttals reduced the toxicity in comments. We also find that users were more likely to discuss their positions on the underlying tweets in replies when the labels contained rebuttals. When false content was labeled, results show that liberals engaged more than conservatives. Labels also increased the engagement of more passive Twitter users. This case study has direct implications for the design of effective soft moderation and related policies.