So after we’ve extended the virtual cloud server twice, we’re at the max for the current configuration. And with this crazy growth (almost 12k users!!) even now the server is more and more reaching capacity.
Therefore I decided to order a dedicated server. Same one as used for mastodon.world.
So the bad news… we will need some downtime. Hopefully, not too much. I will prepare the new server, copy (rsync) stuff over, stop Lemmy, do last rsync and change the DNS. If all goes well it would take maybe 10 minutes downtime, 30 at most. (With mastodon.world it took 20 minutes, mainly because of a typo :-) )
For those who would like to donate, to cover server costs, you can do so at our OpenCollective or Patreon
Thanks!
Update The server was migrated. It took around 4 minutes downtime. For those who asked, it now uses a dedicated server with a AMD EPYC 7502P 32 Cores “Rome” CPU and 128GB RAM. Should be enough for now.
I will be tuning the database a bit, so that should give some extra seconds of downtime, but just refresh and it’s back. After that I’ll investigate further to the cause of the slow posting. Thanks @[email protected] for assisting with that.
The code is open source on GitHub and the backend is written in Rust.
I have no idea how it goes in terms of scaling…
Apparently it’s not ideal at Horizontal scaling (that’s what I’ve picked up from reading stuff here, could be wrong)
I think they can horizontally scale the Postgres maybe? Postgres is probably the biggest performance bottleneck.
Databases are also the hardest bit to horizontally scale. Web servers are easy cos they’re (usually) stateless . It’s state that’s hard to scale out.
Have they implemented the postgres? Last I read they were still using websockets (I think I’m not a programmer and don’t know what all that means lmfao)
Postgres is a database. Websockets is a communication method between the browser and the server.
So the infrastructure is like this:
So there’s a couple problems here. First of all, websockets are very resource heavy so too many of them will slow down the server, that’s why they are working on replacing websockets with something else. And second, the database (Postgres) is getting overloaded so they need to figure out how to scale it up or use it more efficiently.
Man, the place I work at has a single DB instance (with a read replica) serving millions of users. I’m not saying this should be true everywhere, but I don’t understand how the postgres is buckling here. Does Lemmy have a bunch of horrifically unoptimized queries, or is the DB just on an underpowered machine?
Yes to both. Lemmy does have a few PRs to make the queries more efficient (and not just blind generic ORM calls) but most instances outside of lemmy.world are very underpowered (which makes federation synchronization slow).