They aren’t behind any login or anything stopping it. So yah, I expect they’re already are being indexed.
I’m worried the Fediverse is going to be an SEO nightmare though.
That’s a good point. The same content exists on multiple instances. I think Lemmy should set a canonical URL the HTML <head>. The canonical URL of each post should point to the instance where a post originates from.
Seems like that is not implemented in Lemmy. Also checked Mastodon, and doesn’t have a canonical tag either.
On browser, I see a little fediversee icon next to every post/comment that links to the canonical. I don’t think traditional html search engines know how to index it, though. Probably better if we have our own lemmy search engine like browse.feddit.de
You can search posts on lemmy using Google already. They are indexed as separate sites, so you may have to use “site:lemmy.ml” or “site:beehaw.org” in order to find a post. I do wonder if major search engines will try to handle federation more comprehensively in the future, though.
Here’s an example Google search, with these operators:
(site:lemmy.world OR site:lemmy.ml OR site:beehaw.org OR site:feddit.de OR site:sh.itjust.works OR site:lemmy.one OR site:lemmy.ca)
Yes, actually it’s already getting indexed. For example you can try searching for
site:lemmy.ml
on DDG or Google. Although it’ll probably take a while before search engines will deem lemmy instances “popular enough” for posts to show up for regular search queries (assuming that’ll even happen at all).There is a similar topic on beehaw.
Yes, lemmy posts can be indexed and found, but there are disadvantages compared to big, centralized services. I just found some posts on ecosia page 3.
I’m not sure if posts from instances without ‘lemmy’ in their name would show up when somebody searches for “something lemmy”.
I checked my instance, and here’s the contents of the
robots.txt
file.User-Agent: * Disallow: /login Disallow: /settings Disallow: /create_community Disallow: /create_post Disallow: /create_private_message Disallow: /inbox Disallow: /setup Disallow: /admin Disallow: /password_change Disallow: /search/
Legitimate search engines will index everything, except what’s disallowed. Of course, the
robots.txt
could be changed to block all indexing by legitimate search engines.AFAIK they already are, you can google with
site:lemmy.ml