Correct me if I’m wrong. I read ActivityPub standards and dug a little into lemmy sources to understand how federation works. And I’m a bit disappointed. Every server just has a cache and the ability to fetch something from another known server. So if you start your own instance, there is no profit for the whole network until you have a significant piece of auditory (e.g. private instances or servers with no users). Are there any “balancers” to utilize these empty instances? Should we promote (or create in the first place) a way how to passively help lemmy with such fast growth?
Care to expand on this point?
Disclaimer: I’ve only looked a bit at the protocols and high levels descriptions of how it works, and this is just my understanding of it. But it seems to track.
let’s take … [email protected] for example. Right now lemmy.world is the Source of Truth on this, which means if you sign up for it on a different host, let’s say myawersomeinstance.com, that first contacts lemmy.world, copies over posts, and then subscribes on new posts for that. Actually not 100% sure if lemmy.world contacts myawersomeinstance.com when there’s a new post, or myawersomeinstance.com polls lemmy.world… But anyway, point is, lemmy.world is authority on it. myawersomeinstance.com also have [email protected] data, but it’s a copy of it. And lemmy.world is only authority. So if you post something, your server then sends it to lemmy.world and waits a reply. Then lemmy.world contacts all instances that has at least one user following this to tell about the new post. And that new post now exists on a few hundred databases.
The problem is the scaling is whack. Okay, you can have 5000 federated servers with users subscribing to [email protected], but that means lemmy.world needs to update 5000 servers per post, and there’ll be 5000x storage used for that post, and ALL 5000 servers contacts lemmy.world to get the new good stuff.
Frankly, it’s a scaling nightmare. As for a different approach, you could have private / public keys and sign updates from lemmy.world and allow the other instances to fetch the new data from each other. That would also allow more relaxed caching, since it would be generally lower cost to re-fetch the data. Now you need aggressive caching because you don’t want lemmy.world to keel over and die form every server on the planet wanting to hear the latest and greatest posts all the time.
Thanks for the in depth write up! I haven’t looked too far into the docs or the subscription model, but is this a fault on Lemmy’s end, or is this a function of how activity pub handles federated communication? (I’m very new to activity pub/federation, just now reading through the activity pub docs)
I do like your idea of distributed replication via keys,much better than what I had brainstormed
Edit: yeah it does look like it’s a function of activity pub, wonder if theres a more scalable federation protocol out there
Could lemmy.world put a load balancer in front and use that to direct requests to different instances of lemmy.world? Not sure if that question is dumb I’m not a technical guy.
It’s not dumb at all, and it’s a common scaling technique. But the software needs to support it, and I have no idea if lemmy has support for running multiple instances for one server.
Seeing Lemmy groan under influx of new users, but still a much smaller number than established centralized apps made me start wondering how it would scale to a couple of orders magnitude larger numbers. I’ve only started diving into code and architecture, but I’m worried that as the number of instances grow they’ve got an N! connection problem going. This is not a simple problem to fix for a federated system, but it’s got to be addressed eventually.