• @[email protected]
      link
      fedilink
      15
      edit-2
      3 months ago

      but not the misuse of public content

      Is that an admission that they don’t own the content others posted on their site?

      • FurblandOP
        link
        43 months ago

        you would be a good lawyer

    • @Aeri
      link
      663 months ago

      oh no, Reddit is like, the only way to have google still be useful.

      • @[email protected]
        link
        fedilink
        543 months ago

        Funnily enough, google is also the only way to have Reddit be useful.

        Their own search function has been nothing but garbage.

      • @[email protected]
        link
        fedilink
        433 months ago

        That’s the catch, Google made a deal with Reddit and remains the only search engine allowed to access its data for indexing. It cuts off every other search engine

        • vortic
          link
          273 months ago

          Tell me that there is an anti trust suit over this.

          • FurblandOP
            link
            263 months ago

            There’s a suit over google in general so this may well be part of it

        • @TriflingToad
          link
          33 months ago

          really? ddg will show me reddit links, did they have to make a webscraper or something

      • @riodoro1
        link
        313 months ago

        We fucked the internet. It’s proprietary now.

        • FurblandOP
          link
          13 months ago

          That’s bad news, that means the internet is dying

    • FurblandOP
      link
      93 months ago

      Perhaps, likely depends on the crawler though

      • @[email protected]
        link
        fedilink
        123 months ago

        Yeah i dont think ignoring robots.txt is even illegal. They can ofcourse just block your crawlers IP but that would be a cat and mouse game that they would lose in the end.

  • @JusticeForPorygon
    link
    543 months ago

    Not gonna lie this seems like ultimately a win for the Internet. The years of troubleshooting solutions Reddit Provided can be archived (hopefully) but the less people rely on the site itself, the better. At least in my opinion.

    • @TriflingToad
      link
      23 months ago

      I disagree, kinda. Stackoverflow is the other option for questions which is a lot less user friendly, and Lemmy has never shown up in search results for me. If something comes along and makes it simple, great! however I just see a lot more of ad filled hellhole sites in the meantime.

  • @Kojichan
    link
    523 months ago

    I remember finding Google’s robots.txt when they first came out. It was a cute little text ASCII art of a robot with a heart that said, “We love robots!”

    • FurblandOP
      link
      603 months ago

      this is actually quite recent. the old one was much funnier and clearly had actual soul put into it.

  • @[email protected]
    link
    fedilink
    83 months ago

    As annoying as this is, it’s to prevent LLMs from training themselves using Reddit content, and that’s probably the greater of the two evils.

    • FurblandOP
      link
      373 months ago

      That’s all well and good, but how many LLMs do you think actually respect robots.txt?

      • @[email protected]
        link
        fedilink
        English
        143 months ago

        from my limited experience, about half? i had to finally set up a robots.txt last month after Anthropic decided it would be OK to crawl my Wikipedia mirror from about a dozen different IP addresses simultaneously, non-stop, without any rate limiting, and bring it to its knees. fuck them for it, but at least it stopped once i added robots.txt.

        Facebook, Amazon, and a few others are ignoring that robots.txt, on the other hand. they have the decency to do it slowly enough that i’d never notice unless i checked the logs, at least.

    • Anas
      link
      123 months ago

      It’s to prevent LLMs from training themselves using reddit content, unless they pay the party that took no part in creating said content

      FTFY