What's the best way to search the fediverse?

@[email protected] · 4 months ago

What's the best way to search the fediverse?

@LovableSidekick · 3 months ago

Thanks for putting so much time and thought into the discussion. All the problems you talk about exist for every search engine in actual use today. For example, publishing a site on a brand new domain has the exact problem you’re describing with spinning up a new Forte instance. There can be a 24-hr lag before DNS can reliably find the site. Perfect search is an aspirational goal. The realistic goal is to satisfy most needs. No matter how many words you throw at it, I don’t think federated search is an outlandish idea at all.

Jupiter Rowland · 3 months ago

I’m not even only talking about a 24-hour lag. I’m talking about parts of the Fediverse never being discovered at all. After all, the Fediverse doesn’t have a centralised DNS of its own in which all instances are registered but only them, where a search crawler could simply look them up.

Even if someone developed a Web search crawler much like the Google Bot, something that crawls the entire WWW looking for Fediverse instances, how is it supposed to tell Fediverse instances from websites that aren’t Fediverse instances?

I bet the first two proposals for solutions wouldn’t work with (streams).

The first proposal would probably be to go for the instance type, like “mastodon” or “lemmy” or “mbin” or “akkoma” or “misskey” or whatever. This, however, would require valid instance types to be manually added to a kind of config file from which the search crawler could look valid instance types up. This, in turn, would only work if this list was constantly kept complete and up-to-date.

This means: Whenever someone launches a new project, the identifier of this project will have to be added to the list. Whenever someone forks something into a new project, ditto. Now let the devs of the crawler have as little time as the Plume devs or as the sole Firefish dev early this year, and the list of Fediverse instance types will spend months outdated with new projects missing, and the crawler won’t recognise the instances of these new projects as Fediverse projects.

Oh, and it wouldn’t work with (streams) at all. See, (streams) is intentionally without a name, without a brand identity and even without a unified, pre-defined, fixed instance type. It isn’t like all instances identify as “streams” or “(streams)”. Some identify as “streams”, but many others have unique types. The crawler wouldn’t know these identifiers as valid Fediverse instance types (how is that crawler supposed to know that “bunny of doom” is a Fediverse identifier), and thus, it wouldn’t be able to identify (streams) instances as Fediverse instances.

Now you could say that (streams) is so tiny that it wouldn’t hurt to sweep it under the rug. Nobody would notice.

But that’d exactly be the problem. One of the (streams) users is the guy who created (streams) and everything before it all the way back to Mistpark in 2010, the one man who developed more Fediverse protocols and server applications than anyone, the man who invented nomadic identity and magic single sign-on: Mike Macgirvin. He is on one out of only two instances that identify as “y” (because Y is not X).

He is one of the few people in the Fediverse who actually post about what’s possible in the Fediverse that goes way beyond Mastodon. Not only possible, but readily available right now. He started advertising (streams) in the wake of the mass-migration of Twitter users to Mastodon. And if his most recent creation, Forte, manages to take off, he’ll probably advertise that. If (streams) wasn’t caught by crawlers, nobody would read his advertisement except those who already follow him, and I guess half of them already know his creations and what they can do.

Hard-coding the custom identifiers of (streams) instances into the list is a stupid idea, too. The instance type is not defined upon installation in a config file. It’s an admin-side free-text field that can be changed anytime with no consequences for connections, just because the admin feels like it.

Okay, so here’s the second proposal: Go for nodeinfo. The problem this time: Mike has also intentionally removed almost all nodeinfo code from (streams). He didn’t want (streams) to participate in that eternal rat race between Fediverse projects and Fediverse instances for the best stats on FediDB, Fediverse Observer and The Federation. In fact, (streams) is entirely absent from all three. This, too, is intentional.

If anyone has a better idea, I’m all ears.