CPU load over 70 means I can't even ssh into my server

@PlutoniumAcid · edit-2 2 years ago

CPU load over 70 means I can't even ssh into my server

Black Xanthus · edit-2 2 years ago

The last time I saw this was on a slow-failing HDD.

Check a quick fsck might get you a few answers. You can find more info in the Linux manual. It could just be one or two bad blocks that you can recover and fix the problem (though, ofc, it’s time to backup your data).

The other, slightly unusual time I’ve seen it is with mixed RAM. 16gb made of 2x6g and then 2x4gb did some real odd things to the system. If it’s not the disk, and your box will boot with one stick of ram, try it to see if it fixes the issue. It could be that your RAM speeds are off (or your like me and just put two sticks you had lying around, and it basically worked until it didn’t).

An outlier, that I’ve not seen on modern machines is io/wait for a CD-ROM to spin up, even if your not accessing the CD-ROM. Normally caused by bad cabling. Based on the age of your machine, this is unlikely, but it might be worth unplugging devices to see if one is bad and not reporting properly.

This is, if course, assuming dmsg is empty

Final thought: see if your running SELinux. If you are, turn it off and try again. Those policies are complex, and something installed in a non-standard place could be causing SELinux to slow IO as it fills your logs with warnings.

Hope that helps,

ActuallyRuben · 2 years ago

To add on to this, if you’re using some random RAM stick picked out of the gutter, then it might be worth it to run memtest86+. Bad RAM sectors can give some weird unpredictable issues.

@PlutoniumAcid · 1 year ago

Do not run fsck on a mounted device

So how do I run this on /dev/sda? I can’t very well unmount the OS drive…