Websites must use a properly configured robot.txt file with tags specifically telling OpenAI’s bot, GPTBot, to leave the site alone. (OpenAI also has a couple of other bots, ChatGPT-User and OAI-SearchBot, that have their own tags, according to its information page on its crawlers.)
This sounded weird, so I poked around some. It looks like sites can choose to block all bots and then explicitly allow some (eg Google/Bing). If you want to be choosy as an admin, you’re going to have to put some work in. That sucks. It’s almost like bots need to be typed for something like ‘AI’, ‘search’, ‘archival’, etc…
This sounded weird, so I poked around some. It looks like sites can choose to block all bots and then explicitly allow some (eg Google/Bing). If you want to be choosy as an admin, you’re going to have to put some work in. That sucks. It’s almost like bots need to be typed for something like ‘AI’, ‘search’, ‘archival’, etc…