Lemmy Devs: could you shed light on the range of scalability issues we're about to see with the reddit influx?

teoria@sopuli.xyz · edit-2 2 years ago

Lemmy Devs: could you shed light on the range of scalability issues we're about to see with the reddit influx?

PriorProject · edit-2 2 years ago

You may or may not get Lemmy devs weighing on here (Edit: Nutomic did respond). It’s a VERY busy time for them, and they’re probably focused on fixing imminent scaling issues rather than explaining them to newcomers like us. But to provide some context from another newcomer who is trying to pay attention:

Lemmy was very small until very recently. The biggest instance is lemmy.ml, which according to the stats on it’s homepage has ~30k registered users and ~2k active (which is probably a high water mark… in previous days when I looked it was more like 1k).
As a result of (1), it’s a fair bet that there are some serious inefficiencies in the codebase that just never matter before now. It will take some time to unwind these. A good example of this is in https://github.com/LemmyNet/lemmy/issues/2877, where you can see Lemmy devs and Lemmy.ml admins cooperating with a Postgres expert who is helping them find some low-hanging performance fruit, and the Lemmy team is getting a chance to ask some performance related questions they’ve never been able to get access to an expert for. There’s probably a lot more work like this to do in order to scale Lemmy to work well with 10x and beyond bigger instances.
There may be distributed/federated performance issues as the network grows as well, but Lemmy uses ActivityPub like Mastodon, which already has a much bigger network. I’m inclined to think they’ll be ok in this regard, but you never know… it’s possible they’re abusing the protocol in some way that will need to be fixed to scale to bigger networks of federated servers.
In terms of hardware, lemmy.ml runs on a very modest 8-core VM from OVH: https://lemmy.world/comment/1350. Obviously there’s a LOT more that could be done to get more capacity powering lemmy.ml. Much bigger single servers exist, though not in the lineup of VM offerings from their current provider, which means there are no more “easy” upgrades available to them where they let the cloud provider to the migration work. I tried to break down infra upgrade possibilities in https://lemmy.world/comment/3583. In short, it would be straightforward to expand a Lemmy install to 5-10 machines if you were serious about it. But due to (1) and (2), it’s probably not productive to do so. Algorithmic inefficiencies in the codebase would probably swamp any amount of hardware somewhere between 1.5x and 5x the user/post/comment counts of what lemmy.ml runs today.

There’s a lot of speculation in this comment. I haven’t run or perf-tested a sizeable Lemmy instance. I’m not familiar with the codebase. But I am a software engineer and I know a lot about scaling infra, software, and teams… and the above feel like reasonably informed guesses and speculation in the absence of disagreement from someone more informed than I.