I would like to try building a search index for this instance (maybe others) and as such would like to crawl the site with automated spiders. Now with the shutdown of the reddit API I expect the site to come under quite substantially load and also I would ofc try to not spam the site with too many requests as to not get banned or blocked, due to looking like a DOS attack. Could anyone provide some information on this?

  • @ericjmorey
    link
    English
    11 year ago

    Even Google doesn’t index webpages that include “noindex” in a header. You are going to run into a lot of people who don’t agree with what you are trying to do. If you start reaching out to the people running Fediverse services to let them know that you’re trying to index the data on their services, you can learn what they think of the idea.