• @[email protected]
    link
    fedilink
    English
    21
    edit-2
    5 months ago

    How would a site make itself acessible to the internet in general while also not allowing itself to be scraped using technology?

    robots.txt does rely on being respected, just like no tresspassing signs. The lack of enforcement is the problem, and keeping robots.txt to track the permissions would make it effective again.

    I am agreeing, just with a slightky different take.

    • Album
      link
      fedilink
      15 months ago

      User agent catching is rather effective. You can serve different responses based on UA.

      So generally people will use a robots.txt to catch the bots that play nice and then use useragents to manage abusers.