I know that for data storage the best bet is a NAS and RAID1 or something in that vein, but what about all the docker containers you are running, carefully configured services on your rpi, installed *arr services on your PC, etc.?

Do you have a simple way to automate backups and re-installs of these as well or are you just resigned to having to eventually reconfigure them all when the SD card fails, your OS needs a reinstall or the disk dies?

  • @ikidd
    link
    English
    2
    edit-2
    7 months ago

    I run everything on a 2 node proxmox cluster with ZFS mirror volumes and replication of the VMs and CTs between them, run PBS with hourly snapshots, and sync that to multuple USB drives I swap off site.

    The docker VM can be ZFS snapshotted before major updates so I can rollback.

    • @[email protected]
      link
      fedilink
      English
      17 months ago

      You should get another node, otherwise when node1 fails node2 will reboot itself and then do nothing because it has no quorum

      • @ikidd
        link
        English
        17 months ago

        pvecm expected 1

        • @[email protected]
          link
          fedilink
          English
          27 months ago

          I know, but every time I had to do that it felt like it’s a jank solution. If you have a raspberry pi or smth like that you can also set it up as a qdevice.

          …and if you’re completely fine with how it is you can also just leave it like it is

          • @ikidd
            link
            English
            37 months ago

            So I started to write a reply that said basically that I was OK doing that manually, but thought that “hell, I have a PBS box on the network that would do that fine”. So it took about 3 minutes to install the corosync-qdevice packages on all three and enable it. Good to go.

            Thanks for the kick in the ass.

          • @ikidd
            link
            English
            27 months ago

            So since I now had a “quorate” cluster again, I thought I’d try out HA. I’d always been under the impression that unless you had a shared storage LUN, you couldn’t HA anything. But I thought I’d trigger a replication and then down the 2nd node just as a test. And lo and behold, the first node brought up my OPNsense VM from the replicated image about 2 minutes after the second node lost contact, and internet starts working again.

            I’m really excited about having that feature working now. This was a good night, thank you.

            • @[email protected]
              link
              fedilink
              English
              27 months ago

              If you need another thing to do, you could try to make your opnsense HA and never have your internet stop working while rebooting a node. It’s pretty simple to set up, you might finish it in 1-2 evenings. Happy clustering!

              • @ikidd
                link
                English
                27 months ago

                I’ll look into that. I did see the option in opnsense once upon a time but never investigated it.