• @affiliate
    link
    English
    114 hours ago

    from the article:

    Robots.txt is a line of code that publishers can put into a website that, while not legally binding in any way, is supposed to signal to scraper bots that they cannot take that website’s data.

    i do understand that robots.txt is a very minor part of the article, but i think that’s a pretty rough explanation of robots.txt

      • @[email protected]
        link
        fedilink
        English
        34 hours ago

        List of files/pages that a website owner doesn’t want bots to crawl. Or something like that.

        • @NiHaDuncan
          link
          English
          5
          edit-2
          3 hours ago

          Websites actually just list broad areas, as listing every file/page would be far too verbose for many websites and impossible for any website that has dynamic/user-generated content.

          You can view examples by going to most any websites base-url and then adding /robots.txt to the end of it.

          For example www.google.com/robots.txt