I would like to try building a search index for this instance (maybe others) and as such would like to crawl the site with automated spiders. Now with the shutdown of the reddit API I expect the site to come under quite substantially load and also I would ofc try to not spam the site with too many requests as to not get banned or blocked, due to looking like a DOS attack. Could anyone provide some information on this?

    • @BelirielOP
      link
      English
      11 year ago

      Oh nice I’ll check it out. I’ll guess I’ll have to learn it but shouldn’t pose much of a problem. Btw I sent you a request on Discord (I assume it’s the same username)

  • @ericjmorey
    link
    English
    11 year ago

    Many existing Fediverse services are being operated by people who are opposed indexing the content on their instance(s). You may run into resistance from that angle.

    • @BelirielOP
      link
      English
      11 year ago

      I mean unless they make their instance private I don’t see why you wouldn’t index them? That’s literally why google provided such a value in their early days.

      • @ericjmorey
        link
        English
        11 year ago

        Even Google doesn’t index webpages that include “noindex” in a header. You are going to run into a lot of people who don’t agree with what you are trying to do. If you start reaching out to the people running Fediverse services to let them know that you’re trying to index the data on their services, you can learn what they think of the idea.