I woke up this morning to a text from my ISP, “There is an outage in your area, we are working to resolve the issue”

I laugh, this is what I live for! Almost all of my services are self hosted, I’m barely going to notice the difference!

Wrong.

When the internet went out, the power also went out for a few seconds. Four small computers host all of my services. Of those, one shutdown, and three rebooted. Of the three that ugly rebooted some services came back online, some didn’t.

30 minutes later, ISP sends out the text that service is back online.

2 hours later I’m still finding down services on my network.

Moral of the story: A UPS has moved to the top of the shopping list! Any suggestions??

  • @Kuinox
    link
    English
    115
    edit-2
    10 months ago

    When you are bored, backup a VM then hard kill it and see if it manage to restart properly.
    Software should be able to recover from that.
    If it doesn’t, troubleshoot.

    • Deebster
      link
      fedilink
      English
      1710 months ago

      That reminds me of Netflix’s Chaos Monkey (basically in office hours this tool will randomly kill stuff).

    • @BlackAura
      link
      English
      1110 months ago

      When I built my home server this is what I did with all VMs. Learned how to change the start up delay time in esxi and ensured everything came back online with no issues from a cold built.

      Rip VMware.

    • @[email protected]
      link
      fedilink
      English
      -1110 months ago

      While I appreciate the sentiment, most traditional VMs do not like to have their power killed (especially non-journaling file systems).

      Even crash consistent applications can be impacted if the underlying host fs is affected by power loss.

      I do think that backup are a valid suggestion here, provided that the backup is an interrupted by a power surge or loss.

      • @[email protected]
        link
        fedilink
        English
        4310 months ago

        most traditional VMs do not like to have their power killed (especially non-journaling file systems).

        Why are you using a non-journaling file system in 2024 when those were common 10+ years ago?

          • @[email protected]
            link
            fedilink
            English
            110 months ago

            I would still consider that generation of filesystem to be effort to use while regular journaling filesystems have been so ubiquitous that you need to invest effort to avoid using one.

              • @[email protected]
                link
                fedilink
                English
                010 months ago

                Maybe on some distros that is the case if you install a recent version but to get a non-journaling filesystem you literally have to partition manually to avoid using one on any distro that is still supported today and meant for full sized PCs (as opposed to embedded devices).

                  • @[email protected]
                    link
                    fedilink
                    English
                    010 months ago

                    If you want to use a filesystem that is so bad that it doesn’t even have journaling you need to manually select it. None of them have been using one of those by default for 10-15 years now.

        • lazynooblet
          link
          fedilink
          English
          410 months ago

          It’s been a while since a power cut affected my services, is this why?

          I remember having to troubleshoot mysql corruption following abrupt power loss, is this no longer a thing?

          • @[email protected]
            link
            fedilink
            English
            810 months ago

            Databases shouldn’t even need a journaling filesystem, they usually pay attention to when to use fsync and fdatasync.

            In fact journaling filesystems basically use the same mechanisms as databases only for filesystem metadata.

      • Possibly linux
        link
        fedilink
        English
        1310 months ago

        Your system should be fine after a hard kill. If its not stop using it as that’s going to be a problem down the road.