Incus Gitlab Runner

With the recent fork and drama around LXD it might be time to give Incus a chance.

Using Incus as GitLab runner is nice because it provides you with a simple interface to run containers and VMs for the cases where Docker is not enough. Helpfully there is a custom LXD GitLab runner provided by GitLab.

Based on that I created a custom Incus GitLab runner. Checkout: https://github.com/fliiiix/gitlab-incus-runner

This can be easily integrated into the deployment system used to setup GitLab runners. (Thinks Ansible)

It also assumes that you already installed Incus on the runner. To achieve that you can follow the official documentation for that. Or take some inspiration from the next section.

Installing Incus

Look at the official documentation. This is just a quick summary on how I did it.

mkdir -p /etc/apt/keyrings/
curl -fsSL https://pkgs.zabbly.com/key.asc -o /etc/apt/keyrings/zabbly.asc

sh -c 'cat <<EOF > /etc/apt/sources.list.d/zabbly-incus-stable.sources
Enabled: yes
Types: deb
URIs: https://pkgs.zabbly.com/incus/stable
Suites: $(. /etc/os-release && echo ${VERSION_CODENAME})
Components: main
Architectures: $(dpkg --print-architecture)
Signed-By: /etc/apt/keyrings/zabbly.asc

EOF'

apt-get update
apt-get install incus

sudo adduser ubuntu incus-admin

Big Kudos to zabbly and Stéphane Graber for providing pre-built images!

And to setup Incus I used cloud-config runcmd.

#cloud-config
runcmd:
 - 'incus admin init --preseed < /etc/incus.seed && touch /etc/incus.init'

This assumes that you created your preseed config:

$ cat /etc/incus.seed
config: {}
networks:
- config:
    ipv4.address: auto
    ipv6.address: none
  description: ""
  name: incusbr0
  type: ""
  project: default
storage_pools:
- config:
    size: 200GiB
  description: ""
  name: default
  driver: zfs
profiles:
- config: {}
  description: ""
  devices:
    eth0:
      name: eth0
      network: incusbr0
      type: nic
    root:
      path: /
      pool: default
      type: disk
  name: default
projects: []
cluster: null

Zfs Nixos

As I am still experimenting with my NixOS setup I thought it would be nice to separate the user-date onto a separate nvme ssd. The plan was to use ZFS and put my /var/lib on it. This would allow me to create snapshots which can be pushed or pulled to my other ZFS systems. That all sounded easy enough but took way longer than expected.

Hardware

It all starts with a new NVME SSD. I got a WD Blue SN570 2000 GB, M.2 228 because it was very cheap. And here is my first learning apparently one should re-run nixos-generate-config or add the nvme module by hand to the hardware config (boot.initrd.availableKernelModules) to allow NixOS to correctly detect the new hardware. (I lost a lot of time to figure this out.)

Software

Creating the ZFS pool is the usual. But one thing to note is the device name since NixOS imports using the /dev/disk/by-id/ path it is recommended to use that path to create the pool. The by-id name should also be consistent during hardware changes, while other mappings might change and lead to a broken pool. At least that is my understanding of it. (Source people on the internet, Inconsistent Device Names Across Reboot Cause Mount Failure Or Incorrect Mount in Linux)

sudo zpool create -f -O atime=off -O utf8only=on -O normalization=formD -O aclinherit=passthrough -O compression=zstd -O recordsize=1m -O exec=off tank /dev/disk/by-id/nvme-eui.e8238fa6bf530001001b448b4e246dab

Move data

On the new pool we create datasets and mount them.

zfs create tank/var -o canmount=on
zfs create tank/var/lib -o canmount=on

Then we can copy over all the current data from /var/lib.

# 1. stop all services accessing `/var/lib`
# 2. move data
sudo cp -r /var/lib/* /tank/var/lib/
sudo rm -rf /var/lib/
sudo zfs set mountpoint=/var/lib tank/var/lib

And here is the rest of my NixOS config for ZFS:

# Setup ZFS
# Offical resources:
# - https://nixos.wiki/wiki/ZFS
# - https://openzfs.github.io/openzfs-docs/Getting%20Started/NixOS/index.html#installation

# Enable support for ZSF and always use a compatible kernel
boot.supportedFilesystems = [ "zfs" ];
boot.zfs.forceImportRoot = false;
boot.kernelPackages = config.boot.zfs.package.latestCompatibleLinuxPackages;

# head -c 8 /etc/machine-id
# The primary use case is to ensure when using ZFS 
# that a pool isn’t imported accidentally on a wrong machine.
networking.hostId = "aaaaaaaa";

# Enable scrubing once a week
# https://openzfs.github.io/openzfs-docs/man/master/8/zpool-scrub.8.html
services.zfs.autoScrub.enable = true;

# Names of the pools to import
boot.zfs.extraPools = [ "tank" ];

And in the end run sudo nixos-rebuild switch to build it and switch to the configuration.

Fucked up ZFS Pool

In the end I ended up doing everything again and starting fresh. Because my system did not import my ZFS pool after a reboot. Here are the key things i learned.

NixOS does import the pools by-id by running a command like this:

zpool import -d "/dev/disk/by-id" -N tank

source

And this can be configured via boot.zfs.devNodes source. Took a while to figure out since i usually just run zpool import tank.

And the behavior I saw was:

zpool import tank <- works
zpool import -d "/dev/disk/by-id" -N tank <- fails

As it turns out wipefs does not necessarily remove all zpool information from a disk.

$ sudo wipefs -a /dev/nvme0n1
/dev/nvme0n1: 8 bytes were erased at offset 0x1d1c10abc00 (zfs_member): 0c b1 ba 00 00 00 00 00
/dev/nvme0n1: 8 bytes were erased at offset 0x1d1c10a9800 (zfs_member): 0c b1 ba 00 00 00 00 00
/dev/nvme0n1: 8 bytes were erased at offset 0x1d1c10a8000 (zfs_member): 0c b1 ba 00 00 00 00 00
...

While wipefs reports everything deleted we can still check with zdb that there is in fact still a ZFS label on the disk.

$ sudo zdb -l /dev/nvme0n1
failed to unpack label 0
------------------------------------
LABEL 1
------------------------------------
    version: 5000
    name: 'tank'
    state: 1
    txg: 47
    pool_guid: 16638860066397443734
    errata: 0
    hostid: 2138265770
    hostname: 'telesto'
    top_guid: 4799150557898763025
    guid: 4799150557898763025
    vdev_children: 1
    vdev_tree:
        type: 'disk'
        id: 0
        guid: 4799150557898763025
        path: '/dev/disk/by-id/nvme-eui.e8238fa6bf530001001b448b4e246dab'
        whole_disk: 0
        metaslab_array: 64
        metaslab_shift: 34
        ashift: 9
        asize: 2000394125312
        is_log: 0
        create_txg: 4
    features_for_read:
        com.delphix:hole_birth
        com.delphix:embedded_data
    labels = 1 2 3

And the way to clear that is by dd-ing the right spots in the front and at the back of the disk.

sudo dd if=/dev/zero of=/dev/nvme0n1 count=4 bs=512k
sudo dd if=/dev/zero of=/dev/nvme0n1 oseek=3907027120

There is a superuser answer which shows how that works. And here is my lengthy back and forth where we figured out that this is the issue.

Other resources which where helpful

Dnsmasq On Nixos 2305

This is a small update on the evolved configuration from my Build a simple dns with a Raspberry Pi and NixOS blog post.

I upgraded to 23.05 and learned that i should run sudo nix-collect-garbage -d from time to time to avoid running out of disk space.

And here is the updated dnsmasq configuration:

networking.hostFiles = [(pkgs.fetchurl {
  url = "https://hostname.local/l33tname/hosts/raw/branch/main/hosts";
  sha256 = "14hsqsvc97xiqlrdmknj27krxm5l50p4nhafn7a23c365yxdhlbx";
})];

services.dnsmasq.enable = true;
services.dnsmasq.alwaysKeepRunning = true;
services.dnsmasq.settings.server = [ "85.214.73.63" "208.67.222.222" "62.141.58.13" ];
services.dnsmasq.settings = { cache-size = 500; };

As you can see with the latest version some config keys changed slightly. But the big new thing is that the hosts files is now fetched from my local git server. This allows me to version and edit this file in a singe place.

Note: The hash nix-prefetch-url $url should be updated if the file changes, otherwise NixOS will happily continue to use the the file fetched last time.

Mikrotik Openvpn Updated Params

I run a site-to-site tunnel: OPNsense to MikroTik site-to-site tunnel. Which runs fine but the support for OpenVPN in MikroTik is not very good. At some point I need to investigate Wireguard for this site-to-site connection.

But for now I still run OpenVPN and a recent upgrade of OpenVPN on OPNsense made my tunnel fail because it could not find a common cipher.

No common cipher between server and client. Server data-ciphers: 'AES-256-GCM:AES-128-GCM:CHACHA20-POLY1305', client supports cipher 'AES-256-CBC'

As you can see MikroTik with the settings I documented uses AES-256-CBC. According to the documentation it should also do aes256-gcm which would match the supported AES-256-GCM.

But how would one do that, because the UI does not offer any options for that. Turns out you need to do that on the terminal only.

Here is how:

/interface/ovpn-client/
edit <connection-name>
value-name: auth
(Opens a editor update value to: null, exit with control + o)

edit <connection-name>
value-name: cipher
(Opens a editor update value to: aes256-gcm, exit with control + o)

Check with print if the settings are changed.

Note if your OpenVPN log looks something like this it's probably still a mismatch in cypher, at least in my case it was a typo.

Data Channel MTU parms [ mss_fix:1389 max_frag:0 tun_mtu:1500 tun_max_mtu:1600 headroom:136 payload:1768 tailroom:562 ET:0 ]
Outgoing Data Channel: Cipher 'AES-128-GCM' initialized with 128 bit key
Incoming Data Channel: Cipher 'AES-128-GCM' initialized with 128 bit key
Connection reset, restarting [0]
SIGUSR1[soft,connection-reset] received, client-instance restarting

Hint: make sure you changed the OPNsense server config to use AES-256-GCM!

Pull Zfs Backup

I got a good deal on a 18TB Harddisk. Which was reason enough to rethink my backup setup. Until now I used a push strategy where the system pushed the backup to my backup system. (see blog post for reference Zfs Remote Backups) This will change today!

The new strategy is that my backup system will pull the data itself. This has a few advantages and makes it harder to if the main system is compromised to compromise the backup. I will also replace the shell scripts with sanoid or actually with syncoid. For snapshots I continue to use zfstool.

The New Setup

On the system which should be backuped we need to install sanoid and add a user with ssh key and minimal permissions.

# Install package
pkg install sanoid

# Add user
pw user add -n backup -c 'Backup User' -m -s /bin/sh

# Setup SSH with key
mkdir /home/backup/.ssh
echo "ssh-ed25519 AAA...jaM0 foo@bar.example" > /home/backup/.ssh/authorized_keys
chown -R backup:backup /home/backup/.ssh 
chmod 700 /home/backup/.ssh
chmod 600 /home/backup/.ssh/authorized_keys

# Give access to the ZFS pools for the new user
zfs allow -u backup aclinherit,aclmode,compression,create,mount,destroy,hold,send,userprop,snapshot tank
zfs allow -u backup aclinherit,aclmode,compression,create,mount,destroy,hold,send,userprop,snapshot zroot

As for the system which should pull the datasets. We also install sanoid and add a small script to our crontab which does all the magic and pulls all datasets we want to backup. It also pushes the status to influx so alerting and graphing can be done. (Careful with the script there are some things you need to update for your usecase!)

# Install package
pkg install sanoid

# Put script in crontab
$ crontab -l
13       0 * * 7	/root/backup.sh

The /root/backup.sh script:

#!/bin/sh

REMOTE='backup@hostname-or-ip'
KEY='/root/.ssh/backup-key'
lockfile='/tmp/backup.pid'
logfile=/var/log/backup/hostname_log.txt

mkdir -p $(dirname $logfile)

if [ ! -f $lockfile ]
then
    echo $$ > $lockfile
else
    echo "$(date): early exit ${lockfile} does exist previous backup still running" | tee -a $logfile
    exit 13
fi

# Backup a ZFS dataset by pulling it
# localhost is the host where this scripts runs,
# where as remote is the host which should get backuped
# $1: name of the dataset on the remote host
# $2: name of the dataset on the local host
# return: a status code, 0 if successful
backup_dataset() {
    remote_ds=$1
    local_ds=$2

    syncoid --sshkey=${KEY} --recursive --no-privilege-elevation ${REMOTE}:${remote_ds} ${local_ds} >> /tmp/raw_backup.log 2>&1
    code=$?hostname echo "$(date): pulling ${remote_ds} -> ${local_ds} exit code was: ${code}" >> $logfile
    echo $code
}

start=$(date +%s)
echo "$(date): backup started (log: $logfile)" | tee -a $logfile

exit_code=0
exit_code=$((exit_code + $(backup_dataset 'tank/backup' 'tank/backup')))
exit_code=$((exit_code + $(backup_dataset 'tank/data' 'tank/data')))
exit_code=$((exit_code + $(backup_dataset 'tank/music' 'tank/music')))
exit_code=$((exit_code + $(backup_dataset 'tank/photography' 'tank/photography')))
exit_code=$((exit_code + $(backup_dataset 'tank/podcast' 'tank/podcast')))
exit_code=$((exit_code + $(backup_dataset 'zroot/iocage' 'tank/iocage')))
exit_code=$((exit_code + $(backup_dataset 'zroot/usr/home' 'tank/hostname-home')))

end=$(date +%s)
runtime=$((end-start))
echo "$(date): exit code: ${exit_code} script ran for ~$((runtime / 60)) minutes ($runtime seconds)" | tee -a $logfile

curl -i -XPOST -u mrinflux:password 'https://influx.host.example:8086/write?db=thegreatedb' \
        --data-binary "backup,host=hostname.example status=${exit_code}i
        backuptime,host=hostname.example value=${runtime}i"

rm -f $lockfile
exit $exit_code