Proxmox - Smartest ZFS Pool Replication Process Across Cluster?

@pr0927 · edit-2 2 months ago

Proxmox - Smartest ZFS Pool Replication Process Across Cluster?

@computergeek125 · edit-2 2 months ago

I think you’re asking too much from ZFS. Ceph, Gluster, or some other form of cluster native filesystem (GFS, OCFS, Lustre, etc) would handle all of the replication/writes atomically in the background instead of having replication run as a post processor on top of an existing storage solution.

You specifically mention a gap window - that gap window is not a bug, it’s a feature of using a replication timer, even if it’s based on an atomic snapshot. The only way to get around that gap is to use different tech. In this case, all of those above options have the ability to replicate data whenever the VM/CT makes a file I/O - and the workload won’t get a write acknowledgement until the replication has completed successfully. As far as the workload is concerned, the write just takes a few extra milliseconds compared to pure local storage (which many workloads don’t actually care about)

I’ve personally been working on a project to convert my lab from ESXi vSAN to PVE+Ceph, and conversions like that (even a simpler one like PVE+ZFS to PVE+Ceph would require the target disk to be wiped at some point in the process.

You could try temporarily storing your data on an external hard drive via USB, or if you can get your workloads into a quiet state or maintenance window, you could use the replication you already have and rebuild the disk (but not the PVE OS itself) one node at a time, and restore/migrate the workload to the new Ceph target as it’s completed.

On paper, (I have not yet personally tested this), you could even take it a step farther: for all of your VMs that connect to the NFS share for their data, you could replace that NFS container (a single point of failure) with the cluster storage engine itself. There’s not a rule I know of that says you can’t. That way, your VM data is directly written to the engine at a lower latency than VM -> NFS -> ZFS/Ceph/etc

@pr0927 · 2 months ago

Oh I didn’t think the gap window is a bug - I was just acknowledging it, and I’m OK with it.

Definitely some ideas one day for the future but with my current time, architecture, and folks depending on certain services (and my own sanity with the many months I already spent on this), not really looking to re-do anything or wipe drives.

Just want to make the best of my ZFS situation for now - I know it can’t do everything that Ceph and GlusterFS can do.