Restart an OOM killed docker automatically

@RustyNova · 8 months ago

Restart an OOM killed docker automatically

lemmyng · 8 months ago

Use -m and limit the build job’s memory so it doesn’t kill the docker daemon.

@RustyNova · 8 months ago

Fair enough. But I don’t want a bandaid fix solution. Even more that I do all my docker through portainer and the option isn’t there.

It could also be useful if a container got a memory leak and is unbounded

@just_another_person · 8 months ago

This isn’t a band-aid, it’s the literal fix.

Structuring the available CPU and Memory reservations for containers is LITERALLY the entire reason containers exist. Just because you’re only familiar with the “dumb” way of using them doesn’t mean you should be dismissive when someone offers you advice when you come here asking for it.

You’re also seemingly just a dick for being lazy, because I looked, and wuddyaknow. So now you’re just rude, dickish, and lazy.

Take the advice from the original responder, and then go and learn how to use the things you’re asking for help with, along with some manners.

@[email protected] · edit-2 3 months ago

deleted by creator

Bo7a · 8 months ago

You can’t expect people who are knowledgeable about this stuff to just forever accept that someone asks for advice, gets told the solution, and then ignores/belittles the person with knowledge.

This is our daily life experience. We get hired to be experts, and get told by non-experts that our solutions are not tenable every single day. Only for that solution to eventually be accepted when the user in question figures out their idea was not useful and the expert was correct.

We have to put up with it at work, we are not obliged to accept it here.

@[email protected] · edit-2 3 months ago

deleted by creator

Bo7a · 8 months ago

In which way am I complaining? I am explaining why calling a valid solution a bandaid might be construed as belittling their very real knowledge of this process. And how that is a regular pattern in a lot technical fields.

And don’t give me this shit about ‘I’m not the person you were talking to’ This is an open forum not a direct/private message.

@[email protected] · 8 months ago

You sound like you work in product

@[email protected] · edit-2 3 months ago

deleted by creator

@just_another_person · 8 months ago

I was obliged to respond to let him know that he was actually provided the correct answer, and he didn’t need to respond to the person who provided the correct answer like that. I don’t feel it’s right to sit idly by and let people who are only trying to help for free be getting snark like that. Obliged, much.

@RustyNova · 8 months ago

There’s a difference between helping people with misunderstanding a tool and belittling them for being wrong. It’s just a matter of wording that separate an helpful answer from a toxic one

I could tell you “You should actually use Y instead of X. They are numerous benefits like A, B and C. The doc actually have a great example you may have missed or not understood it was for this purpose. It will help you a lot more than what you are thinking of doing.” And this would be fine.

But “Just use Y. X is bad because Y is made for that. You not willing to use Y shouldn’t make you do X. There’s even a the first Google link on how to do it” isn’t fine.

And I have not belittled them at all. I have said that it wasn’t what I was looking for. A lot of times people post questions they think should solve their issue, but only to realise that they didn’t fully understand the full picture and theirs problem is on a larger scale.

@RustyNova · edit-2 8 months ago

Alright, sorry for calling it a “bandaid fix”. It wasn’t just the right term for what I wanted to say. I was more referring on how it would only fix issues in cases of builds, and not on actual runtime, which can also be an issue if I am not careful. So yeah, it’s the fix for the issue in the post, but this solution made me realise that this isn’t the only thing I want.

But the second part is… Just chill. It’s a home server. Not a high availability cluster. I can afford stupid things. Heck, I’m only asking this question because I got stupid and haven’t limited the job count of a cargo build, downing my server. I don’t care that my build crash. I just want to not have to manually restart it, because when I’m not here I can’t do it.

As for the link that you sent, it’s container limitations, not image building limitations. And I already have setup some on my most hungry container, stats shown that it blew past it, so idk what’s going on there.

Edit: NVM. This is a bandaid fix. What if you forgot to put the flag? Like it’s been 5 month since last time and forgot to do the same fix? Or you accidentally removed it while editing the command? I’m actually looking for a solution that fixed my problem fully, not a partial solution

@just_another_person · 8 months ago

Then you didn’t explain the issue very well, because what you’re asking for was given to you exactly. Builds also have flags, and you should know that if you’re complaining about advice given to you. I’m not saying that to admonish you, just giving you the info.

The next step down is that you’re using Portainer, and having user-error issues somehow. So another solution is renaming these actions something with a very obvious prefix like “BUILD ACTION”, but also setting memory limits.

The very last step is making sure your swap is in order. Allocate 2x your system memory to swap, and this will help alleviate OOM issues to a point, but especially during builds.

If you come back and say this is a band-aid solution, get a better machine and stop asking questions to solve the impossible in here. This is your fault this is an issue to begin with, you don’t know how to run your machines (regardless of it just being a home server or whatever ), and you’re just being rude.

Badabinski · 8 months ago

The other person may have responded with a fair amount of hostility, but they’re absolutely correct. I run Kubernetes clusters hosting millions of containers across hundreds of thousands of VMs at my job, and OOMKills are just a fact of life. Apps will leak memory, and you’re powerless to fix it unless you’re willing to debug the app and fix the leak. It’s better for the container to run out of memory and trigger a cgroup-scoped OOM kill. A system-wide OOM kill will murder the things you love, shit in your hat, and lick your face like David Tennant licked Krysten Ritter.

@RustyNova · 8 months ago

Oh that’s not a problem to let a container get killed. It’s perfectly fine. What I want is just not crippling my whole server because one container did a funny.

If it keeps docker and the portainer VM I’ll be 100% ok, because I can just restart it. I don’t want to have remote access to my server outside of my home for security reasons, so this is just the bare minimum

@[email protected] · 8 months ago

Those remote access fears can be solved with a wireguard VPN

@[email protected] · 8 months ago

I don’t want to have remote access to my server outside of my home for security reasons, so this is just the bare minimum

What are your security concerns?

@Treczoks · 8 months ago

This is not a bandaid, this is the solution. What you try is, at least for this scenario, the band aid.

@[email protected] · 8 months ago

??? Your original proposed solution is literally a bandaid fix.

Magiilaro · 8 months ago

Systemd has config options for automatic restart of crashed services. https://www.freedesktop.org/software/systemd/man/latest/systemd.service.html#Restart=

@[email protected] · edit-2 8 months ago

Do you have your services set up with restart=unless-stopped? I wonder if that would auto restart them after OOM.

@KrapKake · 8 months ago

You should be able to make docker exempt from early oom. Check it’s github for instructions.

@RustyNova · 8 months ago

But can it prevent killing only docker, and not the build/big containers processes?

@[email protected] · 8 months ago

I don’t know the best way but I would use cron and start docker every minute (if it’s not running).

@[email protected] · 8 months ago

I don’t know the best way

Apparently…

Don’t do this. Either don’t go OOM to begin with (somebody else told you how to limit container memory usage} and/or configure systemd to restart docker if it quits. I’m surprised systemd isn’t already.

@[email protected] · edit-2 8 months ago

It’s usually good to state why something is good or bad :)

@[email protected] · edit-2 8 months ago

It’s fairly obvious I feel.

You’re saying rather than use a system tool that does the exact thing that you want you should bodge together a cron job that accomplishes your goal but doesn’t actually do what you want.

Like say you want to stop the docker service for some reason? systemctl stop docker will do that. Then your cron job will restart it. That’s not the desired outcome. You want the service running IF the service SHOULD be running. Which is a different thing than “always running”. And its’ exactly what you get for free with systemd without any silly custom BS.

@RustyNova · 8 months ago

Seems like the best solution. I’ll look into it

@[email protected] · 8 months ago

Seems like the best solution.

Over using a system tool designed to monitor and restart services that stop?

@RustyNova · 8 months ago

? I’m agreeing with you?

@[email protected] · 8 months ago

Sorry - was ambiguous and thought you were saying the “cron” thing sounded best.

@RustyNova · 8 months ago

I’ll try that. I know that systemctl has a start-or-reload command, but is there any “start-or-ignore” commands? Or start flags?