Stop services while creating snapshots during backup?

Avid Amoeba · edit-2 7 months ago

Stop services while creating snapshots during backup?

@gedhrel · 7 months ago

I’d be cautious about the “kill -9” reasoning. It isn’t necessarily equivalent to yanking power.

Contents of application memory lost, yes. Contents of unflushed OS buffers, no. Your db will be fsyncing (or moral equivalent thereof) if it’s worth the name.

This is an aside; backing up from a volume snapshot is half a reasonable idea. (The other half is ensuring that you can restore from the backup, regularly, automatically, and the third half is ensuring that your automated validation can be relied on.)

Avid Amoeba · edit-2 7 months ago

Contents of application memory lost, yes. Contents of unflushed OS buffers, no. Your db will be fsyncing (or moral equivalent thereof) if it’s worth the name.

Good point. I guess kill -9 is somewhat less catastrophic than a power-yank. If a service is written well enough to handle the latter it should be able to handle the former. Should, subject to very interesting bugs that can hide in the difference.

This is an aside; backing up from a volume snapshot is half a reasonable idea. (The other half is ensuring that you can restore from the backup, regularly, automatically, and the third half is ensuring that your automated validation can be relied on.)

I’m currently thinking of setting up automatic restore of these backups on the off-site backup machine. That is the backups are transferred to the off-site machine, restored to the dirs of the services, then the services are started. This should cover the second half I think. Of course those services can’t be used to store new data because they’ll be regularly overwritten with every backup. In the event of a hard snafu where the main machine disappears, I could stop the auto restore on the off-site machine and start using the services from it, effectively making it the main machine. If this turns out to be reasonable and working, I might trash all of the file-based backup-and-transfer mechanisms and switch to ZFS send/recv. That should allow to shrink the data delta between main and off-site to minutes instead of hours or days. Does this make any sense?