Due to the nature of the default robots.txt and the meta tags in Lemmy, search engines will index even non-local communities. This leads to results that are undesirable, such as unrelated/undesirable content being associated with your instance.

As of today, lemmy-ui does not allow hiding non-local (or any) communities from Google and other search engines. If you, like me, do not want your instance to be associated with other content, you can add a custom robots.txt and response headers to avoid indexing.

In nginx, simply add this:

# Disallow all search engines
location / {
  ...
  add_header X-Robots-Tag noindex;
}

location = /robots.txt {
    add_header Content-Type text/plain;
    return 200 "User-agent: *\nDisallow: /\n";
}

Here’s a commit in my fork of the lemmy-ansible playbook. And here’s a corresponding issue I opened in lemmy-ui.

I hope this helps someone :-)

  • NXL
    link
    fedilink
    English
    41 year ago

    Please don’t do this and keep information easy to google. The best part of Reddit was how much hours of time it saves when googling for information on stuff

    • @Tandybaum
      link
      English
      11 year ago

      I just found this thread because I was curious about the indexing of Lemmy.

      I totally agree with you. One of the best parts of Reddit is when you google that super weird niche question you’ll get a bunch of Reddit links.

    • @Tandybaum
      link
      English
      11 year ago

      I just found this thread because I was curious about the indexing of Lemmy.

      I totally agree with you. One of the best parts of Reddit is when you google that super weird niche question you’ll get a bunch of Reddit links.