I really like the Lemmy Community Browser at browse.feddit.de for locating communities across instances.

When I first stood up my instance, I guess it was crawled and my community showed up there. Awesome!

I’ve since rebuilt on a new domain and federated again, and my old instance dropped off (as expected). However, a day or so later, my new instance still isn’t showing up there. The new instance is known to join-lemmy.org and fediverse.observer, so it is discoverable.

I’m curious if anyone knows how often it updates and if that tool is based on a project I can clone to spin up another instance. I’m interested in both running a replica of it to act as another entry point as well as using it as a base to develop some quality of life enhancements. e.g. a click to subscribe option that ties into your home instance and does the initial search and subscribe steps.

  • @[email protected]
    link
    fedilink
    English
    71 year ago

    It updates 4 times a day and should be crawled if referenced by another instance already known. It also gets dropped on high response times. Otherwise, I have no idea, but could take a look…

    • Admiral PatrickOP
      link
      fedilink
      English
      21 year ago

      Thanks for the info!

      Well, it should have been discovered by now, I would think. My first instance was picked up the same day it went live, so that fits with the 4x a day interval you’re describing.

      Any idea on what is considered a high response time? I’m using my instance, and it seems pretty snappy.

        • Admiral PatrickOP
          link
          fedilink
          English
          2
          edit-2
          1 year ago

          Ok, thanks. My response times are way less than that unless there is some other issue.

          I have been dealing with a lot of rate limit errors in lemmy backend, and that may have been my fault. I’m using proxy protocol in front of Nginx, and I didn’t have the right variable set to populate the XFF header from that ($remote_addr vs $proxy_protocol_addr). I usually set those via an include for common proxy headers, but had to remove that include and add the lines manually as some of my defaults were breaking Lemmy UI.

          So it may be possible my instance was erroneously rate-limiting the crawl which contributed to that? I made the fix to that Nginx config about half an hour ago. I’ll wait several hours and re-check the community browser to see if that was the cause.

          One last question: Is that crawler based on any kind of existing project I can pull and fork a new project from?

          Thanks for your insight.

          • @[email protected]
            link
            fedilink
            English
            41 year ago

            So it may be possible my instance was rate-limiting the crawl which contributed to that?

            might be an option.
            and the more instances link to yours, the more likely it will be crawled, in case other instances drop out.

            i 'd have to update the repo at codeberg, but right now it has low prio for me as i broke my hand and try to stay away from the keyboard ^^

            • Admiral PatrickOP
              link
              fedilink
              English
              31 year ago

              and the more instances link to yours, the more likely it will be crawled, in case other instances drop out.

              I’ve got a lot of peers in my instances list (lemmy.ml, beehaw.org, feddit.de are the largest peers), and I’ve clicked into the /instances page of many of them; they all show my domain as a peer, so I believe I’m good there.

              but right now it has low prio for me as i broke my hand and try to stay away from the keyboard

              Well, I definitely thank you for powering through to answer my dumb questions, and I do appreciate your time.

                  • Admiral PatrickOP
                    link
                    fedilink
                    English
                    11 year ago

                    There it is!

                    Thanks for checking back.

                    I want to say the blame lies 100% with me on this one.

                    When i moved over to the new domain, I forgot to disable my default WAF policy for robots.txt which is set to disallow all. Realized that late yesterday evening and turned that rule off in my load balancer.