I have this Debian server that won’t stop crashing. It crashes once every 2 or 3 days. Everything’s up to date, the cpu is good and prime 95 never finds any problems. The ram is good and I’ve run every ram test there is and never found anything wrong. I just can’t get it to stop crashing and it’s driving me insane.

I used to have an arduino connected to the motherboard’s reset jumper and then set up a bash script as a systemctl service that sent a signal to the arduino every 10 seconds and if the arduino didn’t receive a signal after 30 seconds it forces a reboot. This doesn’t even automate the process of restarting after a crash because too often, the server will crash just lightly enough that everything except that autorestart bash script service stops working so it won’t reboot. It does double amount the time the server works without manual intervention though which is better than nothing but not good enough.

Other than just randomly installing different distros until I find one that doesn’t do this (reinstalling an os and then setting all the server stuff back up is very time consuming), what can I do to troubleshoot/solve/stop or otherwise do anything about these crashes?

  • @[email protected]
    link
    fedilink
    310 months ago

    I’ve been using Debian for years without crashes, so I don’t think it’s a software issue. It sounds like a hardware issue to me; it could be your motherboard or power supply.

    • @PeterPoopshitOP
      link
      2
      edit-2
      10 months ago

      I’ve already replaced the cpu and the hard drives. I could try swapping out the ram even though its never failed any kind of ram check. I have evga 450br psu which is supposedly a good budget psu but I guess I could try replacing it. When I’m done with my gaming watercooling build I’ll have a spare motherboard I could try but if I change the motherboard, I’d likely have to reinstall just to get all the chipset drivers to work and if I’m reinstalling an os, I should choose something other than Debian because I would be changing 2 things in the same amount of time and effort it takes to change 1 thing which double the chances arriving on a combination of things that results in it not crashing anymore.

      If the only way to get this working is seriously to randomly replace more parts and hope something finally works, I might make a serious effort to go back to using my known stable Athlon Xp I was using before I “upgraded” to this one. I’d have to install Gentoo and probably lose compatibility with a few things but i am so sick and tired of dealing with this server crashing all the time that it might be worth it.

      • @[email protected]
        link
        fedilink
        110 months ago

        Yeah, unfortunately it’s damn near impossible to pin down the failing part exactly without a bunch of spares parts.

        You could look around your mobo for bulging capacitors, but that could be a long shot.

        You could also try sifting through your journalctl looking for warnings and errors.

  • @liquidpaper
    link
    210 months ago

    Sorry to come so late to the post, but have you checked the logs to see what was going on before the crash?

    try journalctl -b -1 -e as it will show the last of the logs of the previous boot.

  • @vegetaaaaaaa
    link
    29 months ago
    • Define “crashing”. What stops working?
    • Same troubleshooting steps as any other computer program, start by checking the logs (/var/log/syslog or journalctl or application-specific logs depending on what exactly is “crashing”)