Looks like it works.
Edit still see some performance issues. Needs more troubleshooting
Update: Registrations re-opened We encountered a bug where people could not log in, see https://github.com/LemmyNet/lemmy/issues/3422#issuecomment-1616112264 . As a workaround we opened registrations.
Thanks
First of all, I would like to thank the Lemmy.world team and the 2 admins of other servers @[email protected] and @[email protected] for their help! We did some thorough troubleshooting to get this working!
The upgrade
The upgrade itself isn’t too hard. Create a backup, and then change the image names in the docker-compose.yml
and restart.
But, like the first 2 tries, after a few minutes the site started getting slow until it stopped responding. Then the troubleshooting started.
The solutions
What I had noticed previously, is that the lemmy container could reach around 1500% CPU usage, above that the site got slow. Which is weird, because the server has 64 threads, so 6400% should be the max. So we tried what @[email protected] had suggested before: we created extra lemmy containers to spread the load. (And extra lemmy-ui containers). And used nginx to load balance between them.
Et voilà. That seems to work.
Also, as suggested by him, we start the lemmy containers with the scheduler disabled, and have 1 extra lemmy running with the scheduler enabled, unused for other stuff.
There will be room for improvement, and probably new bugs, but we’re very happy lemmy.world is now at 0.18.1-rc. This fixes a lot of bugs.
Have you considered running your Lemmy instance on more than a single machine? If it is possible to run two lemmy containers anyway (ie, lemmy is not a singleton), why not run them on separate machines? With load balancing you could achieve a more stable experience. It might be cheaper to have many mediocre machines rather than a single powerful one too, as well as more sustainable long-term (vertical vs horizontal scaling).
The downside would be that the set-up would be less obvious than with Docker compose and you would probably need to get into k8s/k3s/nomad territory in order to orchestrate a proper fleet.
The whole Lemmy app is a single monolith too. K8s will probably help, but there’s probably going to need to be a move to some kind of distributed app setup that can be run as a singleton or broken out into multiple parts.
Any improvements to performance are also probably going to come with downsides to ease of setup, but I’m sure there’s people out there that could simplify the process.