This comic follows on from the Previous comic which will almost certainly provide context.

You might not wanna be famous, but when you’re level 10, every organization within a mile is watching what you’re doing.

  • @[email protected]
    link
    fedilink
    253 months ago

    Someone on lemmy suggested to create a dummy endpoint that normal people won’t be able to navigate to, and disallow it in robots.txt

    Then when somebody crawls it you know they are ignoring robots.txt, and you ip ban them

    • AhdokOP
      link
      fedilink
      153 months ago

      That’s pretty clever.

      I think that these AI scrapers might be smart enough that this doesn’t really work though - at least if I were designing them I’d have them all come from dynamic IPs and not have any of them bother hitting the same target more than once. These things are very dedicated to acquiring content without consent, and if they’re capable of causing problems for (say) Reddit, I’m not sure my little website is going to have much luck deterring them.

      Honestly a better strategy might be to just glaze everything I draw.

      • Johanno
        link
        fedilink
        73 months ago

        I am not sure if it costs money, but you could implement captchas.

        Or use cloudflare to do that bot detecting for you.

        Worst case you make it so you need to create an account to see content.

        • AhdokOP
          link
          fedilink
          43 months ago

          Well, we are already using cloudflare, that’s one of the other reasons why the site is so slow… I don’t think the other two suggestions prevent a scraper from requesting the information from the server… I think they’d just make it more arduous for real people to access the content.

      • @[email protected]
        link
        fedilink
        43 months ago

        Honestly a better strategy might be to just glaze everything I draw.

        I doubt that will help, they can still scrape the site and then wait until whatever version of Glaze was applied is cracked.

      • @Lumisal
        link
        23 months ago

        Instead of a tech solution, why not a legal one? Place somewhere in the website that refusal to follow your robots.txt is agreement to pay you x amount of money for your content. Then combine that with the dummy page solution the other person brought up so you can record the IP address, then take them to court so they pay you. Has potential to bring you a really really nice chunk of money.

        • AhdokOP
          link
          fedilink
          53 months ago

          I believe that there are multiple very high profile billion-dollar lawsuits being run against AI companies right now. I don’t really have the budget to sue these companies.