• @RememberTheApollo
    link
    English
    325 months ago

    It found that most of the internet is translated, as 57.1 percent of the sentences in the corpus were multi-way parallel in at least three languages.

    What a shit title. The article basically is about translating languages and goes on to say that AI is doing it badly.

    Not that 50% of all web content is AI generated.

    Shame on whoever wrote this clickbait garbage.

    • LughOPM
      link
      fedilink
      English
      9
      edit-2
      5 months ago

      Not that 50% of all web content is AI generated.

      On the contrary. It explicitly states that it is.

      To quote - "It found that most of the internet is translated, as 57.1 percent of the sentences in the corpus were multi-way parallel in at least three languages. "

      So in other words, that majority of web content is AI translations of other content. As its often poorly translated, or entirely mistranslated, it qualifies as “AI-generated garbage” - hence the headline.

      • @[email protected]
        link
        fedilink
        English
        75 months ago

        Technically, but I think I and a lot of other readers thought it was talking about original content from AIs, as opposed to translated.

        I have noticed those sites with answers to commonly searched questions, which look very convincing and have AI generated “authors” as well as a topic-specific URL, but then sometimes lose the plot of the question half way through. I almost fall for them, and I’m a crusty internet person, so I can only imagine how many people are just totally swallowing the info.

        • snooggums
          link
          fedilink
          15 months ago

          ‘Original content’ from AI is just regurgitated content with adjustments.

          • @[email protected]
            link
            fedilink
            English
            15 months ago

            How many adjustments does it take before it’s new content. If the answer is a lot, are humans ever original either?

            • snooggums
              link
              fedilink
              25 months ago

              Humans are frequently unoriginal, which is why they get caught copying existing things with adjustments. But they also do make new things based on existing things that add something that new in a way that is significantly different in a way that might use some parts of existing things in a way that is original.

              The Thing, Predator, and Alien are all other worldly being who hunt humans but would you consider them regurgitated content with adjustments?

              The thing is about fear of other people, with an alien monster.

              Predator is about macho men being outclassed, with an alien monster.

              Alien is about sexual assault, with an alien monster.

              AI won’t accidentally create anything comparable by accident, because these three movies aren’t even the output of a single human. Hell, even books are not the out the output of a single person. They have editors and reviews and collaboration that involves sharing of knowledge and influenced by experiences that AI won’t accidentally stumble upon by accident. AI will create the direct to video knock offs that are just copying existing media to profit because AI is like an executive who tries to always make what is already proven to work because it is seen as reliable.

              • @[email protected]
                link
                fedilink
                English
                15 months ago

                Alright, that’s a weaker claim (that is, less of an extraordinary claim) than I was expecting. LLMs aren’t quite as good as a human at conceptual originality yet, and I can’t prove they will catch up, especially if thematic subtext is the measure.

                I guess I’ll just say my original point stands then. There’s a difference between something made from a prompt by ChatGPT, and something produced from a roughly equivalent text by a translation tool.

      • @Grimy
        link
        English
        75 months ago

        You do not generate text when you translate it. The two words have different meanings.

        • @Baahb
          link
          English
          3
          edit-2
          5 months ago

          If you translate a sentence once using a computer, it’s probably a translation. If you translate a translation, you are using a computer to regenerate computer generated content, even if it started with a human seed in the first translation. The two words only have different meanings in specific context. They CAN mean the same thing, but don’t necessarily or even often.

          In this case though, the article does suggest that AI is taking ai content and rewriting it, or translating from “English” to “English” a bunch of times. Which is both translation and generation.

          • @Grimy
            link
            English
            2
            edit-2
            5 months ago

            English to English would be rewording and would totally fall under AI generated garbage but the article doesn’t seem to mention this. It’s entirely about English to other language, mostly in Africa and the global south.

            Although taking articles and translating them is using AI, I don’t think that’s what most people associate with “AI generated garbage” hence the click bait.

            It’s an interesting article, I just think the headline is misleading.

  • Endorkend
    link
    fedilink
    305 months ago

    Its obvious when watching Google and Bing results.

    Try to find any sort of objective information and the first 3-4 pages will almost all be AI generated garbage that took most of the information from some other highly outdated source that was garbage to begin with.

    And as the engines are AI, they can automatically manipulate search results and keep dates and time stamps updated, so that whenever google visits, the page is always the “newest” information.

    • snooggums
      link
      fedilink
      175 months ago

      When the first five results are the same sentences worded slightly differently like a freshman essay it is not a good sign that I will find a real answer.

      • Endorkend
        link
        fedilink
        155 months ago

        The most annoying thing is that almost all tech information has fallen victim to this shit.

        We now have to go back to pre-2000’s methods of searching sites, by first identifying sites as reliable and then by relying on the sites own search engines to not suck.

        In some cases, this is workable.

        In cases where the sites have integrated Google searches, this is even more useless than using Google itself.

        • LughOPM
          link
          fedilink
          English
          55 months ago

          Someone should invent a search engine that allows for curated sources. For most things, I’d love to search among the top few thousand sites, and exclude everything else.

          • Apathy Tree
            link
            fedilink
            English
            5
            edit-2
            5 months ago

            I haven’t used kagi, but I believe you can do exactly that with it. You do have to pay for the service, but that’s probably a good thing.

            This is a link to the features page. It allows you to permanently ban or boost results from specific domains. But you may need to do some manual effort to make that happen, I don’t really know if there are community-curated backbones or anything for that.

            But you can also see if the result is popular, and they seem to work pretty hard to make their platform worth the spend. Everything I’ve heard from people who use it is good.

            https://blog.kagi.com/kagi-features

          • Endorkend
            link
            fedilink
            45 months ago

            I’ve got exactly that running on my home network for tech stuff.

            I’ve thought of opening it up and even been thinking of building a group of people trustworthy to do the curation of sites, but I generally CBA interacting with people that much, I used to be highly active on forums like Madonion/futuremark, [H], etc, but those days are long behind me and these days, I post a bit on Reddit and talk to my wife and that’s about it.

            If things proceed to go to shit as much as it has, I may open it up anyway, mostly because maintaining and re-curating sites is a drag on its own.

            The amount of sites that were once great tech spots that then got gulped up by the same ol same ol big tech sites to be turned into generic shit, it’s not that they become uncountable, it’s that it’s almost every single one of them.

            The best still seems to be simply posting questions on the few OG computer/tech forums that managed to survive.

            For hardware and OS, places like ServeTheHome, [H], Anandtech, Techpowerup, etc.

            For programming information, it’s so murky I can’t even suggest any specific sites anymore, not even Stack.

            Phone/Tablet info, even XDA is getting murky, mostly because a lot of users there only watch the forum for their specific device, so if yours isn’t one that is used by a lot of people, info gets super limited.

            It’s gotten bad out there.

          • Semi-Hemi-Demigod
            link
            fedilink
            45 months ago

            Yahoo started out like this. They had humans curating the sites that they searched, and it was pretty good until the web got too big for that to be efficient.

          • Endorkend
            link
            fedilink
            35 months ago

            No need to invent.

            That’s how originally search engines, including Google, Yahoo and all the other big ones worked.

            You didn’t get indexed by default.

            You either got indexed by being submitted or by being referenced often by one or more well represented sites.

            It’s only later in the game they started crawling everything.

    • @asdfasdfasdf
      link
      English
      35 months ago

      I wonder if this will push humanity to go back to books and libraries.

      • @[email protected]
        link
        fedilink
        English
        25 months ago

        Books and textes on paper have one big favour: They are not as easy to change than digital textes.

  • @saltesc
    link
    English
    105 months ago

    Oh, we know.

  • 🇰 🔵 🇱 🇦 🇳 🇦 🇰 ℹ️
    link
    fedilink
    English
    8
    edit-2
    5 months ago

    Man, I have been accused of being AI tons of times in the last few years. I don’t think people are very good at distinguishing reality from AI when it comes to text.

    Researchers at the Amazon Web Services AI lab found that over half of the sentences on the web have been translated into two or more languages…

    They attribute this to machine learning algorithms, yet even without those translations of translations of translations also have decreasing accuracy when done by people.

  • LughOPM
    link
    fedilink
    English
    45 months ago

    One of the ironies of Google leading so much cutting-edge AI development is that it is simultaneously poisoning its own business from within. Google Search is getting worse and worse, on an almost monthly basis, as it fills up with ever more SEO-spam. Early adopters are abandoning it for Chat-GPT-like alternatives; which means the mass market probably soon will too.

    The other irony is that it will probably take AI to save us from AI-generated SEO spam. For everyone touting AI products that will write blogs and emails, there will be people selling products that detect their garbage and save you from wasting your time reading it.

  • @[email protected]
    link
    fedilink
    English
    35 months ago

    I would argue that most of the webpages have been generated without human input for a long time. So much automated scam with sketchy download links was the norm years before any ‘modern AI’ have been a thing.

  • @[email protected]
    link
    fedilink
    English
    25 months ago

    I hear the message but to be honest, I can’t believe it. There must something I don’t get. But at a second thought, in the google search resolutes, I see a lot of dubious resultes.