Setting up ZFS on a Proxmox VM (an update)

I previously asked here about moving to ZFS. So a week on I’m here with an update. TL;DR: Surprisingly simple upgrade.

I decided to buy another HBA that came pre-flashed in IT mode and without an onboard BIOS (so that server bootups would be quicker - I’m not using the HBA attached disks as boot disks). For £30 it seems worth the cost to avoid the hassle of flashing it, plus if it all goes wrong I can revert back.

I read a whole load about Proxmox PCIE passthrough, most of it out of date it would seem. I am running an AMD system and there are many sugestions online to set grub parameters to amd_iommu=on, which when you read in to the kernel parameters for the 6.x version proxmox uses, isn’t a valid value. I think I also read that there’s no need to set iommu=pt on AMD systems. But it’s all very confusing as most wikis that should know better are very Intel specific.

I eventually saw a youtube video of someone running proxmox 8 on AMD wanting to do the same as I was and they showed that if IOMMU isn’t setup, then you get a warning in the web GUI when adding a device. Well that’s interesting - I don’t get that warning. I am also lucky that the old HBA is in its own IOMMU group, so it should pass through easy without breaking anything. I hope the new one will be the same.

Worth noting that there are a lot of bad Youtube videos with people giving bad advise on how to configure a VM for ZFS/TrueNAS use - you need them passed through properly so the VM’s OS has full control of them. Which is why an IT HBA is required over an IR one, but just that alone doesn’t mean you can’t set the config up wrong.

I also discovered along the way that my existing file server VM was not setup to be able to handle PCIe passthrough. The default Machine Type that Proxmox suggests - i440fx - doesn’t support it. So that needs changing to q35, also it has to be setup with UEFI. Well that’s more of a problem as my VM is using BIOS. A this point it became easier to spin up a new VM with the correct setting and re-do the configuration of it. Other options to be aware of: Memory ballooning needs to be off and the CPU set to host.

At this point I haven’t installed the new HBA yet.

Install a fresh version of Ubuntu Server 24.04 LTS and it all feels very snappy. Makes me wonder about my old VM, I think it might be an original install of 16.04 that I have upgraded every 2 years and was migrated over from my old ESXi R710 server a few years ago. Fair play to it, I have had zero issues with it in all that time. Ubuntu server is just absolutely rock solid.

Not too much to configure on this VM - SSH, NFS exports, etckeeper, a couple of users and groups. I use etckeeper, so I have a record of the /etc of all my VMs that I can look back to, which has come in handy on several occasions.

Now almost ready to swap the HBA after I run the final restic backup, which only takes 5 mins (I bloody love restic!). Also update the fstabs of VMS so they don’t try mount the file server and stop a few from auto starting on boot, just temporarily.

Turn the server off and get inside to swap the cards over. Quite straightforward other than the SAS ports being in a worse place for ease of access. Power back on. Amazingly it all came up - last time I tried to add an NVME on a PCIe card it killed the system.

Set the PICe passthrough for the HBA on the new VM. Luckily the new HBA is on it’s own IOMMU group (maybe that’s somehow tied to the PCIE slot?) Make sure to tick the PCIE flag so it’s not treated as PCI - remember PCI cards?!

Now the real deal. Boot the VM, SSH in. fdisk -l lists all the disks attached. Well this is good news! Try create the zpool zpool create storage raidz2 /dev/disk/by-id/XXXXXXX ...... Hmmm, can’t do that as it knows it’s a raid disk and mdadm has tried to mount it so they’re in use. Quite a bit of investigation later with a combination of wipefs -af /dev/sdX, umount /dev/md126, mdadm --stop /dev/sd126 and shutdown -r now and the RAIDynes of the disks is gone and I can re-run the zpool command. It that worked! Note: I forgot to add in ashift=12 to my zpool creation command, I have only just noticed this as I write, but thankfully it was clever enough to pick the correct one.

$ zpool get all | grep ashift
storage  ashift                         0                              default

Hmmm, what’s 0?

$ sudo zdb -l /dev/sdb1 | grep ashift
ashift: 12

Phew!!!

I also have passed through the USB backup disks I have, mounted them and started the restic backup restore. So far it’s 1.503TB in after precisely 5 hours, which seems OK.

I’ll setup monthly scrub cron jobs tomorrow.

P.S. I tried TrueNAS out in a VM with no disks to see what it’s all about. It looks very nice, but I don’t need any of that fancyness. I’ve always managed my VM’s over SSH which I’ve felt is lighter weight and less open to attack.

Thanks for stopping by my Ted Talk.