Wikipedia has a new initiative called WikiProject AI Cleanup. It is a task force of volunteers currently combing through Wikipedia articles, editing or removing false information that appears to have been posted by people using generative AI.

Ilyas Lebleu, a founding member of the cleanup crew, told 404 Media that the crisis began when Wikipedia editors and users began seeing passages that were unmistakably written by a chatbot of some kind.

  • @[email protected]
    link
    fedilink
    English
    2382 months ago

    Further proof that humanity neither deserves nor is capable of having nice things.

    Who would set up an AI bot to shit all over the one remaining useful thing on the Internet, and why?

    I’m sure the answer is either ‘for the lulz’ or ‘late-stage capitalism’, but still: historically humans aren’t usually burning down libraries on purpose.

    • @poszod
      link
      English
      1162 months ago

      State actors could be interested in doing that. Same with the internet archive attacks.

    • @[email protected]
      link
      fedilink
      English
      982 months ago

      historically humans aren’t usually burning down libraries on purpose.

      How on earth have you come to this conclusion.

      • @[email protected]
        link
        fedilink
        English
        362 months ago

        To be fair, it’s usually to effect cultural genocide. It’s not average people burning libraries, it’s usually some kind of authoritarian regime.

        • @SacralPlexus
          link
          English
          34
          edit-2
          2 months ago

          * looks around and gestures broadly in agreement*

    • @Regrettable_incident
      link
      English
      132 months ago

      historically humans aren’t usually burning down libraries on purpose.

      Sometimes they are, Baghdad springs to mind, I’m sure there are other examples. And this library is online so there’s less chance of getting caught with a can of petrol and a box of matches.

      Then there’s every authoritarian regime that tries to ban or burn specific types of books. What we’re seeing here could be more like that - an attempt to muddy the waters or introduce misinformation on certain topics.

    • @Wrench
      link
      English
      92 months ago

      Because basement losers can’t conquer and raze libraries to the ground.

      The internet has shown that assumed anonymity result in people fucking with other people’s lives for the hell of it. Viruses, trolling, etc. This is just the next stage of it because of a new easy to use tool.

    • @[email protected]
      link
      fedilink
      English
      42 months ago

      It’s not about on purpose but usually most people don’t care about what’s not in their interest. Today interests are usually quite shallow what tiktok shows quite well. Libraries do require money for operating. Even internet archive and wikipedia

    • @rsuri
      link
      English
      42 months ago

      Yeah but the other thing about humanity is it’s mostly harmless. Edits can be reverted, articles can be locked. Wikipedia will be fine.

      • @[email protected]
        link
        fedilink
        English
        14
        edit-2
        2 months ago

        Edits can be reverted, articles can be locked.

        Sure, but the vandalism has to be identified first. And that takes time and effort.

      • @[email protected]
        link
        fedilink
        English
        -4
        edit-2
        2 months ago

        Wikipedia relies on sources, and humans choosing the sources like newspapers. And those newspapers are more and more inside a “bubble” that rejects any evidence or reporting presented by a competing bubble.

        Right now wikipedia is covering up one of the greatest acts of mass murder of our times, because the newspapers are covering it up, or rejecting evidence because it’s by the “enemy”. Part of this is a defensive posture against AI bots and enemy disinformation.

    • @weeeeum
      link
      English
      12 months ago

      Its because there’s no accountability for cybercrimes. If humans always had a button to burn down libraries, I’m sure they would have. Instead they had to put themselves in harms way to do such things.

      People do things cause they can, and fucking with Wikipedia is apparently simple.

    • @[email protected]
      link
      fedilink
      English
      1
      edit-2
      2 months ago

      Maybe a strange way of activism that is trying to poison new AI models 🤔

      Which would not work, since all tech giants have already archived preAI internet

      • @[email protected]
        link
        fedilink
        English
        82 months ago

        Ah, so the AI version of the chewbacca defense.

        I have to wonder if intentionally shitting on LLMs with plausible nonsense is effective.

        Like, you watch for certain user agents and change what data you actually send the bot vs what a real human might see.

        • @[email protected]
          link
          fedilink
          English
          22 months ago

          I suspect it would be difficult to generate enough data to intentionally change a dataset. There are certainly little holes, like the glue pizza thing, but finding and exploiting them would be difficult and noticing you and blocking you as a data source would be easy.

        • @T156
          link
          English
          12 months ago

          I have to wonder if intentionally shitting on LLMs with plausible nonsense is effective.

          I don’t think so. The volume of data is too large for it to make much of a difference, and a scraper can just mimic a human user agent and work that way.

          You’d have to change so much data consistently across so many different places that it would be near-impossible for a single human effort.

  • @[email protected]
    link
    fedilink
    English
    1152 months ago

    As for why this is happening, the cleanup crew thinks there are three primary reasons.

    “[The] main reasons that motivate editors to add AI-generated content: self-promotion, deliberate hoaxing, and being misinformed into thinking that the generated content is accurate and constructive,”

    That last one. Ouch.

    • @givesomefucks
      link
      English
      482 months ago

      The vast majority of people think they’re the good guys…

    • TimLovesTech (AuDHD)(he/him)
      link
      fedilink
      English
      342 months ago

      “[The] main reasons that motivate editors to add AI-generated content: self-promotion, deliberate hoaxing, and being misinformed into thinking that the generated content is accurate and constructive,

      I think the main driver behind people misinformed about AI content comes from the fact that outside of tech people, most have no idea that AI will:

      1. 100% make up answers to things it doesn’t know because either the sample size of data they have ingested was to small or was bad. And it will do this with the same robot confidence you get for any other answer.

      2. AI that has been fed to much other AI generated content will begin to “hallucinate” and give some wild outputs, very similar to humans suffering from schizophrenia. And again these answers will be given as “fact” with the same robotic confidence.

    • @[email protected]
      link
      fedilink
      English
      62 months ago

      Well, I was in doubt, so I asked the AI whether I could trust the answers and it told me not to worry about it. That must mean that I only get accurate answers, right? /s

  • @[email protected]
    link
    fedilink
    English
    692 months ago

    Unleashing generative AI on the world was basically the information equivalent of jumping headfirst into Kessler Syndrome.

    • @khannie
      link
      English
      442 months ago

      For the uninitiated like me:

      The Kessler syndrome (also called the Kessler effect,[1][2] collisional cascading, or ablation cascade), proposed by NASA scientists Donald J. Kessler and Burton G. Cour-Palais in 1978, is a scenario in which the density of objects in low Earth orbit (LEO) due to space pollution is numerous enough that collisions between objects could cause a cascade in which each collision generates space debris that increases the likelihood of further collisions.

      Wikipedia link.

        • @khannie
          link
          English
          112 months ago

          I did think that. :) It’s just… So good. I hope it never enshitifies. God help us.

  • @[email protected]
    link
    fedilink
    English
    522 months ago

    Best case is that the model used to generate this content was originally trained by data from Wikipedia so it “just” generates a worse, hallucinated “variant” of the original information. Goes to show how stupid this idea is.

    Imagine this in a loop: AI trained by Wikipedia that then alters content on Wikipedia, which in turn gets picked up by the next model trained. It would just get worse and worse, similar to how converting the same video over and over again yields continuously worse results.

    • @[email protected]
      link
      fedilink
      English
      242 months ago

      See also: model collapse

      (Which is more or less just regression towards the mean with more steps)

    • @Wrench
      link
      English
      152 months ago

      Yes, this is what many of us worry will become the internet in general. AI content generated on from AI trained on AI garbage.

      AI bots can trivially outpace humans.

      • @[email protected]
        link
        fedilink
        English
        112 months ago

        I was just discussing with a friend of mine how we’re rapidly approaching the dead internet. At some point, many websites will likely just be chat bots talking to other chat bots, which then gets used to train further chat bots. Human made content is already becoming harder and harder to find on algorithm heavy websites like Reddit and facebooks suite of sites. The bots can easily outpace any algorithmic changes they might make to help deter them, but my fb using family members all constantly block those weird Jesus accounts and they still show up constantly

    • Captain Aggravated
      link
      fedilink
      English
      72 months ago

      Eventually every article just reads “Delve delve delve delve delve delve delve.”

    • @8uurg
      link
      English
      62 months ago

      A very similar situation to that analysed in this paper that was recently published. The quality of what is generated degrades significantly.

      Although they mostly investigate replacing the data with ai generated data in each step, so I doubt the effect will be as pronounced in practice. Human writing will still be included and even curation of ai generated text by people can skew the distribution of the training data (as the process by these editors would inevitably do, as reasonable text could get through the cracks.)

      • Blaster M
        link
        English
        2
        edit-2
        1 month ago

        AI model makers are very well aware of this and there is a move from ingesting everything to curating datasets more aggressively. Data prep is something many upstarts have no idea is critical, but everyone is learning about, sometimes the hard way.

    • @Zorque
      link
      English
      32 months ago

      Every article would end up being the philosophy page.

  • @TheGrandNagus
    link
    English
    432 months ago

    Jesus Christ. The amount of absolute bellends in the world never ceases to confound me.

    • Bahnd Rollard
      link
      English
      342 months ago

      They used to be contained, every village has their idiot. Now that the internet is the global village, all the formerly isolated idiots have a place to chat.

      • sunzu2
        link
        fedilink
        72 months ago

        Amazing how these idiots are this effective…

        While us common folk can’t organize or agree on anything

        • @[email protected]
          link
          fedilink
          English
          62 months ago

          Most of us do something idiotic once and when the opportunity to do it again, pull back and think "this was embarrassing last time, maybe I’ll re-evaluate. "

          But a dedicated idiotic is a different beast, fill of confidence and have had what ever organ produces shame surgically removed enabling them to commit ever greater acts of idiocy. But then the internet was invented and these people met. Some even had babies. And now there is arms race to see how many idiots can squeeze through the same tiny door. They have recognised their time to shine and seized it with their clammy yet also sticky hands.

          Truly, it’s inspiring in its own special way

  • e$tGyr#J2pqM8v
    link
    fedilink
    English
    42
    edit-2
    2 months ago

    Sabotage Wikipedia, Ddos the Internet Archive. Makes you wonder if in the future we’re going to forget our past. Will actual history be obscured in a sea of alternative histories unrecognizably presented as the same thing. Maybe we need to keep some books laying around in archives just to be sure.

    • @[email protected]
      link
      fedilink
      English
      16
      edit-2
      2 months ago

      The digital dark age will be a real thing, absolutely.

      Interesting idea on a sea of alternative histories. That might be a possible threat.
      Someone else here called it “AI text apocalypse”. I like that term.

    • @[email protected]
      link
      fedilink
      English
      32 months ago

      We have still Anna’s archive, scihub, libgen and old fashion traditional libraries ( including the national ). National libraries won’t disappear in the nearest years, maybe will rotten due to defunding but still they will exist

  • @randon31415
    link
    English
    412 months ago

    If anyone can survive the AI text apocalypse, it is wikipedia. They have been fending off and regulating article writing bots since someone coded up a US town article writer from the 2000 census (not the 2010 or 2020 census, the 2000 census. This bot was writing wikipedia articles in 2003)

    • @T156
      link
      English
      82 months ago

      Hopefully they tightened things up after the Scots incident.

          • @rottingleaf
            link
            English
            42 months ago

            Yep, I recalled that.

            A certain amount of Russians with Cossack roots would do this with Ukrainian on the web, causing a bit less butthurt because TBH a lot of Ukrainians don’t speak in any way proper Ukrainian, but a mix of Ukrainian and Russian, and a lot of the rest talk dialects still different from standard.

    • @[email protected]
      link
      fedilink
      English
      52 months ago

      Well, for everything except fictional articles. Thats the hardest for them, historically

  • @[email protected]
    link
    fedilink
    English
    372 months ago

    I hate to post because I have loved and trusted Wikipedia for years, but the fact that there are folks out there who equally trust what AI tools generate just baffles me.

    • @[email protected]
      link
      fedilink
      English
      82 months ago

      The signal to noise ratio is so low these days. There’s so much information out there but everyone wants to profit from you before you can get it. Even worse, the people with good information generally can’t buy as big a megaphone as the people who profit from lying to you.

      Honestly, I think humans have been more likely to believe an easy lie than a hard truth all along, but it’s easier than ever these days.

  • @nutsack
    link
    English
    312 months ago

    why the fuck would anyone stick ai shit on wikipedia that doesn’t make any sense

    • @NateNate60
      link
      English
      352 months ago

      “[The] main reasons that motivate editors to add AI-generated content: self-promotion, deliberate hoaxing, and being misinformed into thinking that the generated content is accurate and constructive,” Lebleu said.

      • @nutsack
        link
        English
        132 months ago

        so, stupidity basically. they’re just stupid.

        • @[email protected]
          link
          fedilink
          English
          102 months ago

          Many people who are trying to push lies have an agenda to undermine Wikipedia. Trump, Putin supporters, etc.

    • @InverseParallax
      link
      English
      72 months ago

      The irony being a huge amount of the llm knowledge was based on WP in the first place, that and scientific papers.

    • @varjen
      link
      English
      192 months ago

      Or download it in a bunch of other ways directly from Wikipedia.

  • Aatube
    link
    fedilink
    252 months ago

    Don’t worry, it’s not as bad as the title suggests. The attack on Internet Archive is far, far worse. It’s obviously a bit of a problem, though.

  • @WhatsHerBucket
    link
    English
    132 months ago

    This is why we can’t have nice things

      • @[email protected]
        link
        fedilink
        English
        2
        edit-2
        2 months ago

        I wouldn’t know. I use pihole to block all ads on my TV OS. I’m curious though, which service/app is giving you ads on pause? Do you mean like on a Roku TV where the screensaver is ads? Many TVs let you disable that (i.e. LG WebOS.) otherwise pihole is your friend :-)

        • @[email protected]
          link
          fedilink
          English
          12 months ago

          My TV is old enough that it doesn’t have it, I’m just talking about the general trend toward making that a thing. I’m not going to buy a TV that forces ads on me, and the fact that I have to actively look for that on my next TV is appalling.

          • @[email protected]
            link
            fedilink
            English
            12 months ago

            I have bad news for you. Literally every TV has ads now. Every. Single. One. That’s why I keep harping on Pihole. It blocks them.

            • @[email protected]
              link
              fedilink
              English
              22 months ago

              Not the commercial grade ones, like “hospitality” TVs. They’re more expensive, but they’re also intended to be a bit more reliable as well.

              I’m worried they’ll adapt the ads to not be blockable w/ Pihole.

              • @[email protected]
                link
                fedilink
                English
                1
                edit-2
                2 months ago

                Yeah I’m worried about that too. Like Pihole can’t block youtube ads because they’re served from the same domain. Same with Twitch. So that could become even more common.

  • RubberDuck
    link
    English
    72 months ago

    Require someone that wants to add stuff to pay a small amount to the Wikimedia Foundation for activating their account and refund it if they moderate a certain amount.

    • aubertlone
      link
      English
      72 months ago

      Yeah I mean I’ve had minor edits reversed because I didn’t source the fact properly

      And that was like 10 years ago I’m surprised these edits are getting through in the first place

      • @[email protected]
        link
        fedilink
        English
        62 months ago

        Seems like that would be an easy problem to solve… require all edits to have a peer review by someone with a minimum credibility before they go live. I can understand when Wikipedia was new, allowing anyone to post edits or new content helped them get going. But now? Why do they still allow any random person to post edits without a minimal amount of verification? Sure it self-corrects given enough time, but meanwhile what happens to all the people looking for factual information and finding trash?

        • @[email protected]
          link
          fedilink
          English
          32 months ago

          Or at least give it a certain amount of time before it goes live. So if nobody comes around to approve it in 24 hours, it goes live.

          Usually bad edits are corrected within hours, if not minutes, so that should catch the lion’s share w/o bogging down the approval queue too much.

        • RubberDuck
          link
          English
          02 months ago

          Croudsourcing is the strenght that led to the vast resource and also the weakness as displayed here. So probably there will be a need for some form of barrier. Hence my suggestion.