Zfs Nixos

As I am still experimenting with my NixOS setup I thought it would be nice to separate the user-date onto a separate nvme ssd. The plan was to use ZFS and put my /var/lib on it. This would allow me to create snapshots which can be pushed or pulled to my other ZFS systems. That all sounded easy enough but took way longer than expected.

Hardware

It all starts with a new NVME SSD. I got a WD Blue SN570 2000 GB, M.2 228 because it was very cheap. And here is my first learning apparently one should re-run nixos-generate-config or add the nvme module by hand to the hardware config (boot.initrd.availableKernelModules) to allow NixOS to correctly detect the new hardware. (I lost a lot of time to figure this out.)

Software

Creating the ZFS pool is the usual. But one thing to note is the device name since NixOS imports using the /dev/disk/by-id/ path it is recommended to use that path to create the pool. The by-id name should also be consistent during hardware changes, while other mappings might change and lead to a broken pool. At least that is my understanding of it. (Source people on the internet, Inconsistent Device Names Across Reboot Cause Mount Failure Or Incorrect Mount in Linux)

sudo zpool create -f -O atime=off -O utf8only=on -O normalization=formD -O aclinherit=passthrough -O compression=zstd -O recordsize=1m -O exec=off tank /dev/disk/by-id/nvme-eui.e8238fa6bf530001001b448b4e246dab

Move data

On the new pool we create datasets and mount them.

zfs create tank/var -o canmount=on
zfs create tank/var/lib -o canmount=on

Then we can copy over all the current data from /var/lib.

# 1. stop all services accessing `/var/lib`
# 2. move data
sudo cp -r /var/lib/* /tank/var/lib/
sudo rm -rf /var/lib/
sudo zfs set mountpoint=/var/lib tank/var/lib

And here is the rest of my NixOS config for ZFS:

# Setup ZFS
# Offical resources:
# - https://nixos.wiki/wiki/ZFS
# - https://openzfs.github.io/openzfs-docs/Getting%20Started/NixOS/index.html#installation

# Enable support for ZSF and always use a compatible kernel
boot.supportedFilesystems = [ "zfs" ];
boot.zfs.forceImportRoot = false;
boot.kernelPackages = config.boot.zfs.package.latestCompatibleLinuxPackages;

# head -c 8 /etc/machine-id
# The primary use case is to ensure when using ZFS 
# that a pool isn’t imported accidentally on a wrong machine.
networking.hostId = "aaaaaaaa";

# Enable scrubing once a week
# https://openzfs.github.io/openzfs-docs/man/master/8/zpool-scrub.8.html
services.zfs.autoScrub.enable = true;

# Names of the pools to import
boot.zfs.extraPools = [ "tank" ];

And in the end run sudo nixos-rebuild switch to build it and switch to the configuration.

Fucked up ZFS Pool

In the end I ended up doing everything again and starting fresh. Because my system did not import my ZFS pool after a reboot. Here are the key things i learned.

NixOS does import the pools by-id by running a command like this:

zpool import -d "/dev/disk/by-id" -N tank

source

And this can be configured via boot.zfs.devNodes source. Took a while to figure out since i usually just run zpool import tank.

And the behavior I saw was:

zpool import tank <- works
zpool import -d "/dev/disk/by-id" -N tank <- fails

As it turns out wipefs does not necessarily remove all zpool information from a disk.

$ sudo wipefs -a /dev/nvme0n1
/dev/nvme0n1: 8 bytes were erased at offset 0x1d1c10abc00 (zfs_member): 0c b1 ba 00 00 00 00 00
/dev/nvme0n1: 8 bytes were erased at offset 0x1d1c10a9800 (zfs_member): 0c b1 ba 00 00 00 00 00
/dev/nvme0n1: 8 bytes were erased at offset 0x1d1c10a8000 (zfs_member): 0c b1 ba 00 00 00 00 00
...

While wipefs reports everything deleted we can still check with zdb that there is in fact still a ZFS label on the disk.

$ sudo zdb -l /dev/nvme0n1
failed to unpack label 0
------------------------------------
LABEL 1
------------------------------------
    version: 5000
    name: 'tank'
    state: 1
    txg: 47
    pool_guid: 16638860066397443734
    errata: 0
    hostid: 2138265770
    hostname: 'telesto'
    top_guid: 4799150557898763025
    guid: 4799150557898763025
    vdev_children: 1
    vdev_tree:
        type: 'disk'
        id: 0
        guid: 4799150557898763025
        path: '/dev/disk/by-id/nvme-eui.e8238fa6bf530001001b448b4e246dab'
        whole_disk: 0
        metaslab_array: 64
        metaslab_shift: 34
        ashift: 9
        asize: 2000394125312
        is_log: 0
        create_txg: 4
    features_for_read:
        com.delphix:hole_birth
        com.delphix:embedded_data
    labels = 1 2 3

And the way to clear that is by dd-ing the right spots in the front and at the back of the disk.

sudo dd if=/dev/zero of=/dev/nvme0n1 count=4 bs=512k
sudo dd if=/dev/zero of=/dev/nvme0n1 oseek=3907027120

There is a superuser answer which shows how that works. And here is my lengthy back and forth where we figured out that this is the issue.

Other resources which where helpful