It’s fairly obvious why stopping a service while backing it up makes sense. Imagine backing up Immich while it’s running. You start the backup, db is backed up, now image assets are being copied. That could take an hour. While the assets are being backed up, a new image is uploaded. The live database knows about it but the one you’ve backed up doesn’t. Then your backup process reaches the new image asset and it copies it. If you restore this backup, Immich will contain an asset that isn’t known by the database. In order to avoid scenarios like this, you’d stop Immich while the backup is running.

Now consider a system that can do instant snapshots like ZFS or LVM. Immich is running, you stop it, take a snapshot, then restart it. Then you backup Immich from the snapshot while Immich is running. This should reduce the downtime needed to the time it takes to do the snapshot. The state of Immich data in the snapshot should be equivalent to backing up a stopped Immich instance.

Now consider a case like above without stopping Immich while taking the snapshot. In theory the data you’re backing up should represent the complete state of Immich at a point in time eliminating the possibility of divergent data between databases and assets. It would however represent the state of a live Immich instance. E.g. lock files, etc. Wouldn’t restoring from such a backup be equivalent to kill -9 or pulling the cable and restarting the service? If a service can recover from a cable pull, is it reasonable to consider it should recover from restoring from a snapshot taken while live? If so, is there much point to stopping services during snapshots?

  • @solrize
    link
    English
    26 months ago

    Stop the whole VM during snapshots.

    • Avid AmoebaOP
      link
      fedilink
      English
      1
      edit-2
      6 months ago

      Not a VM. Consider the service just a program running on the host OS where either the whole OS or just the service data are sitting on ZFS or LVM.

        • Avid AmoebaOP
          link
          fedilink
          English
          1
          edit-2
          6 months ago

          And I’m using Docker, but Docker isn’t helping with the stopping/running during backup conundrum.

            • Avid AmoebaOP
              link
              fedilink
              English
              1
              edit-2
              6 months ago

              Docker doesn’t change the relationship between a running process and its data. At the end of the day you have a process running in memory that opens, reads, writes and closes files that reside on some filesystem. The process must be presented with a valid POSIX environment (or equivalent). What happens with the files when the process is killed instantly and what happens when it’s started afterwards and it re-reads the files doesn’t change based on where the files reside or where the process runs. You could run it in docker, in a VM, on Linux, on Unix, or even Windows. You could store the files in a docker volume, you could mount them in, have them on NFS, in the end they’re available to the process via filesystem calls. In the end the effects are limited to the interactions between the process and its data. Docker cannot remove this interaction. If it did, the software would break.

                • Avid AmoebaOP
                  link
                  fedilink
                  English
                  1
                  edit-2
                  6 months ago

                  That’s the trivial scenario that we know won’t fail - stopping the service during snapshot. The scenario that I was asking people’s opinions on is not stopping the service during snapshot and what restoring from such backup would mean.

                  Let me contrast the two by completing your example:

                  • docker start container
                  • Time passes
                  • Time to backup
                  • docker stop container
                  • Make your snapshot
                  • docker start container
                  • Time passes
                  • Shit happens and restore from backup is needed
                  • docker stop container
                  • Restore from snapshot
                  • docker start container

                  Now here’s the interesting scenario:

                  • docker start container
                  • Time passes
                  • Time to backup
                  • Make your snapshot
                  • Time passes
                  • Shit happens and restore from backup is needed
                  • docker stop container
                  • Restore from snapshot
                  • docker start container

                  Notice that in the second scenario we are not stopping the container. The snapshot is taken while it’s live. This means databases and other files are open, likely actively being written to. Some files are likely only partially written. There are also likely various temporary lock files present. All of that is stored in the snapshot. When we restore from this snapshot and start the service it will see all of that. Contrast this with the trivial scenario when the service is stopped. Upon stopping it, all data is synced to disk, inflight database operations are completed or canceled, partial writes are completed or discarded, lock files are cleaned up. When we restore from such a snapshot and start the service, it will “think” it just starts from a clean stop, nothing extra to do. In the live snapshot scenario the service will have to do cleanup. For example it will have to decide what to do with existing lock files. Are they there because there’s another instance of the service that is running and writing to the database or did someone kill its process before it had the chance to go through its shutdown procedure. In the former case it might have to log an error and quit. In the other it would have to remove the lock files. And so on and so forth.

                  As for th effect of docker on any of this, whether you have docker stop container or systemctl stop service or pkill service the effects on the process and its data is all the same. In fact the docker and systemctl commands will result in a kill signal being sent to the process of the service anyway.

                  • @[email protected]
                    link
                    fedilink
                    English
                    16 months ago

                    Oh I see – you’re asking a hypothetical.

                    The simple answer is that it’s a bad idea to take snapshots of running databases because at best they could be missing info and at worst they can corrupt.

                    The short answer: Don’t.

          • CrolishGrandma
            link
            English
            1
            edit-2
            6 months ago

            It should work that way. If you use the recommended Docker Compose scripts for immich, you’ll notice that only a few volumes are mounted to store your data. These volumes don’t include information about running instances. If you take snapshots of these volumes, back them up, remove the containers and volumes, then restore the data and rerun the Compose scripts, you should be right where you left off, without any remnants from previous processes. That’s a pro of container process isolation