• @Darkard
    link
    English
    6611 months ago

    It’s going to drive the AI into madness as it will be trained on bot posts written by itself in a never ending loop of more and more incomprehensible text.

    It’s going to be like putting a sentence into Google translate and converting it through 5 different languages and then back into the first and you get complete gibberish

    • @echo64
      link
      English
      5211 months ago

      Ai actually has huge problems with this. If you feed ai generated data into models, then the new training falls apart extremely quickly. There does not appear to be any good solution for this, the equivalent of ai inbreeding.

      This is the primary reason why most ai data isn’t trained on anything past 2021. The internet is just too full of ai generated data.

      • @givesomefucks
        link
        English
        28
        edit-2
        11 months ago

        There does not appear to be any good solution for this

        Pay intelligent humans to train AI.

        Like, have grad students talk to it in their area of expertise.

        But that’s expensive, so capitalist companies will always take the cheaper/shittier routes.

        So it’s not there’s no solution, there’s just no profitable solution. Which is why innovation should never solely be in the hands of people whose only concern is profits

        • @SinningStromgald
          link
          English
          811 months ago

          OR they could just scrape info from the “aska____” subreddits and hope and pray it’s all good. Plus that is like 1/100th the work.

          The racism, homophobia and conspiracy levels of AI are going to rise significantly scraping Reddit.

          • @givesomefucks
            link
            English
            811 months ago

            Even that would be a huge improvement.

            Just have a human decide what subs it uses, but they’ll just turn it losse on the whole website

        • @General_Effort
          link
          English
          111 months ago

          Haha. Grad students expensive. God bless.

      • @T156
        link
        English
        911 months ago

        And unlike with images where it might be possible to embed a watermark to filter out, it’s much harder to pinpoint whether text is AI generated or not, especially if you have bots masquerading as users.

      • @Ultraviolet
        link
        English
        511 months ago

        This is why LLMs have no future. No matter how much the technology improves, they can never have training data past 2021, which becomes more and more of a problem as time goes on.

        • TimeSquirrel
          link
          fedilink
          011 months ago

          You can have AIs that detect other AIs’ content and can make a decision on whether to incorporate that info or not.

            • TimeSquirrel
              link
              fedilink
              2
              edit-2
              11 months ago

              Doesn’t look like we’ll have much of a choice. They’re not going back into the bag.
              We definitely need some good AI content filters. Fight fire with fire. They seem to be good at this kind of thing (pattern recognition), way better than any procedural programmed system.

          • @echo64
            link
            English
            211 months ago

            Fun fact. You can’t. Ais are surprisingly bad at distinguishing ai generated things from real things.

    • RuBisCO
      link
      fedilink
      English
      411 months ago

      What was the subreddit where only bots could post, and they were named after the subreddits that they had trained on/commented like?