Boot across the network

Jun 9, 2024 m73 lenovo homelab kubernetes tftp nfs dhcp

Putting together a homelab Kubernetes cluster in my own stubborn way. I’m assuming a reader who’s basically me before I embarked on this little expedition, so I won’t go into minute detail about day-to-day Linux setup and administration - only the things that are new to me and have changed since I last encountered them.

Part 0 - Best laid plans
Part 1 - Installing the hardware
Part 2 - Boot across the network
Part 3 - PXE Booting Debian with an NFS Root Filesystem
Part 4 - Filesystems for everybody!
Part 5 - Getting a clean boot
Part 6 - Kubernetes at last

Sections added as I actually proceed with this!

Another day another thumb drive

Once I had the machine I’d selected to be my Cluster Gateway set up to my satisfaction (including attaching a 256Gb SATA SSD to the internal socket), I installed Debian 12 Bookworm in the Server (no desktop) mode. I used the debian-12.5.0-amd64-netinst.iso network-install image, and put that onto another USB thumb drive with Startup Disk Creator as usual (using my Ubuntu laptop). Booting that on the Cluster Gateway, I completed the installation and verified that I could boot up directly from the SATA drive, still see both the network devices, and ping the internet on the one connected to my normal network switch and router.

The Telia router was assigning it the 192.168.1.157 IP address as expected, so I also mapped a subdomain of my own to the machine for my own convenience - so this machine becomes cluster-gateway.paperstack.com (and that currently resolves to that private IP address). Don’t use that domain name for anything of your own though; I might change it or delete it without notice. But at the time of writing you can use dig or nslookup to show that it resolves as expected.

$ dig A +short cluster-gateway.paperstack.com
192.168.1.157

I also verified that I could ssh into it from my Ubuntu laptop - and that this still worked ok even if it was running headless with no keyboard.

Any port in a storm

The next step was to set up that second ethernet device. The “primary” device was enp2s0 and that’s set up to connect to my router. The secondary device was eno1 and this needs to be specified as the gateway for the new subnet.

A reminder of how those devices look at this point:

$ ip address
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host noprefixroute 
       valid_lft forever preferred_lft forever
2: enp2s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 00:e0:4c:68:01:77 brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.157/24 brd 192.168.1.255 scope global dynamic enp2s0
       valid_lft 2631sec preferred_lft 2631sec
    inet6 2001:2042:3750:d500:2e0:4cff:fe68:177/64 scope global dynamic mngtmpaddr 
       valid_lft 1170sec preferred_lft 1170sec
    inet6 fe80::2e0:4cff:fe68:177/64 scope link 
       valid_lft forever preferred_lft forever
3: eno1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 00:23:24:94:35:59 brd ff:ff:ff:ff:ff:ff
    altname enp0s25

I need to specify an IP address and subnet for eno1 in the /etc/network/interfaces configuration file. To do this I add the following lines:

auto eno1
iface eno1 inet static
  address 192.168.0.254/24

The Debian NetworkConfiguration wiki page is good for an overview of this stuff and the interfaces (5) man page for the details.

Here auto means that the device will be brought up by ìfup at boot time. The iface defines a network interface for this device, inet specifies that it’s for TCP/IP (i.e. internet protocol) and static that it is explicitly defined here and won’t look for a DHCP server on the network to figure out its address. Then address defines that static address.

The combination of the specific IP address for the interface 192.168.0.254 and the CIDR /24 tells us that the subnet is the last 8 bits of that - so the subnet is for addresses 192.168.0.0 through to 192.168.0.255. The highest bit of a subnet range is the broadcast address - so after a reboot (or an ifup) the interface config looks good:

$ ip address
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host noprefixroute 
       valid_lft forever preferred_lft forever
2: enp2s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 00:e0:4c:68:01:77 brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.157/24 brd 192.168.1.255 scope global dynamic enp2s0
       valid_lft 3520sec preferred_lft 3520sec
    inet6 2001:2042:3750:d500:2e0:4cff:fe68:177/64 scope global dynamic mngtmpaddr 
       valid_lft 1173sec preferred_lft 1173sec
    inet6 fe80::2e0:4cff:fe68:177/64 scope link 
       valid_lft forever preferred_lft forever
3: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 00:23:24:94:35:59 brd ff:ff:ff:ff:ff:ff
    altname enp0s25
    inet 192.168.0.254/24 brd 192.168.0.255 scope global eno1
       valid_lft forever preferred_lft forever
    inet6 fe80::223:24ff:fe94:3559/64 scope link 
       valid_lft forever preferred_lft forever

A more normal address for the gateway/router would be the 192.168.0.1 address. I forgot that while I was setting it up, and it doesn’t actually matter very much. If you prefer a less weird address for your gateway, amend accordingly!

IP addresses for everyone!

Next up was to setup the cluster gateway as a DHCP server on the eno1 interface, handing out IP addresses to any machine that boots up while attached to the cluster switch. This is fairly straight forward! Firstly to install the DHCP daemon, for which I used isc-dhcp-server - apparently that’s deprecated in place of the Kea server from the same group, but I’m not looking for perfection in this config and there was more info readily available on isc-dhcp.

Installing in Debian is the usual apt magic away:

$ sudo apt install isc-dhcp-server
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following additional packages will be installed:
  policycoreutils selinux-utils
Suggested packages:
  policykit-1 isc-dhcp-server-ldap ieee-data
The following NEW packages will be installed:
  isc-dhcp-server policycoreutils selinux-utils
0 upgraded, 3 newly installed, 0 to remove and 0 not upgraded.
Need to get 1,766 kB of archives.
After this operation, 7,818 kB of additional disk space will be used.
Do you want to continue? [Y/n] Y
Get:1 http://deb.debian.org/debian bookworm/main amd64 isc-dhcp-server amd64 4.4.3-P1-2 [1,479 kB]
Get:2 http://deb.debian.org/debian bookworm/main amd64 selinux-utils amd64 3.4-1+b6 [126 kB]
Get:3 http://deb.debian.org/debian bookworm/main amd64 policycoreutils amd64 3.4-1 [161 kB]
Fetched 1,766 kB in 0s (21.8 MB/s)
Preconfiguring packages ...
Selecting previously unselected package isc-dhcp-server.
(Reading database ... 34186 files and directories currently installed.)
Preparing to unpack .../isc-dhcp-server_4.4.3-P1-2_amd64.deb ...
Unpacking isc-dhcp-server (4.4.3-P1-2) ...
Selecting previously unselected package selinux-utils.
Preparing to unpack .../selinux-utils_3.4-1+b6_amd64.deb ...
Unpacking selinux-utils (3.4-1+b6) ...
Selecting previously unselected package policycoreutils.
Preparing to unpack .../policycoreutils_3.4-1_amd64.deb ...
Unpacking policycoreutils (3.4-1) ...
Setting up selinux-utils (3.4-1+b6) ...
Setting up policycoreutils (3.4-1) ...
Setting up isc-dhcp-server (4.4.3-P1-2) ...
Generating /etc/default/isc-dhcp-server...
Job for isc-dhcp-server.service failed because the control process exited with error code.
See "systemctl status isc-dhcp-server.service" and "journalctl -xeu isc-dhcp-server.service" for details.
invoke-rc.d: initscript isc-dhcp-server, action "start" failed.
× isc-dhcp-server.service - LSB: DHCP server
     Loaded: loaded (/etc/init.d/isc-dhcp-server; generated)
     Active: failed (Result: exit-code) since Wed 2024-06-05 20:30:33 CEST; 10ms ago
       Docs: man:systemd-sysv-generator(8)
    Process: 773 ExecStart=/etc/init.d/isc-dhcp-server start (code=exited, status=1/FAILURE)
        CPU: 17ms

Jun 05 20:30:31 cluster-gateway dhcpd[785]: bugs on either our web page at www.isc.org or in the README file
Jun 05 20:30:31 cluster-gateway dhcpd[785]: before submitting a bug.  These pages explain the proper
Jun 05 20:30:31 cluster-gateway dhcpd[785]: process and the information we find helpful for debugging.
Jun 05 20:30:31 cluster-gateway dhcpd[785]:
Jun 05 20:30:31 cluster-gateway dhcpd[785]: exiting.
Jun 05 20:30:33 cluster-gateway isc-dhcp-server[773]: Starting ISC DHCPv4 server: dhcpdcheck syslog for diagnostics. ... failed!
Jun 05 20:30:33 cluster-gateway isc-dhcp-server[773]:  failed!
Jun 05 20:30:33 cluster-gateway systemd[1]: isc-dhcp-server.service: Control process exited, code=exited, status=1/FAILURE
Jun 05 20:30:33 cluster-gateway systemd[1]: isc-dhcp-server.service: Failed with result 'exit-code'.
Jun 05 20:30:33 cluster-gateway systemd[1]: Failed to start isc-dhcp-server.service - LSB: DHCP server.
Processing triggers for man-db (2.11.2-2) ...

Note that it installed but failed to start - next steps were to clean up the configuration. In /etc/dhcp/dhcpd.conf I amended (for the option commands) and added (for the rest) the following:

option domain-name "paperstack.com";
option domain-name-servers 8.8.8.8, 4.4.4.4;

default-lease-time 600;
max-lease-time 7200;

subnet 192.168.0.0 netmask 255.255.255.0 {
  range 192.168.0.1 192.168.0.253;
  option routers 192.168.0.254;
}

As you can see this will hand out IP addresses in the range 192.168.0.1 through to 192.168.0.255 to clients in the subnet (omitting this machine’s own ip address which is static). It will advertise itself (192.168.0.254) as being the router. The DHCP server and the router don’t have to be the same machine, but I’m making this cluster gateway do all the dirty jobs.

I also edited /etc/default/isc-dhcp-server and changed the line reading:

INTERFACESv4=""

To read:

INTERFACESv4="eno1"

This tells isc-dhcp-server which interface it’s offering DHCP services on.

After a reboot, checking via sudo journalctl -u isc-dhcp-server.service to check the service status, the output looks promising:

Jun 05 21:02:17 cluster-gateway systemd[1]: Starting isc-dhcp-server.service - LSB: DHCP server...
Jun 05 21:02:17 cluster-gateway isc-dhcp-server[494]: Launching IPv4 server only.
Jun 05 21:02:17 cluster-gateway dhcpd[509]: Wrote 0 leases to leases file.
Jun 05 21:02:17 cluster-gateway dhcpd[509]: Server starting service.
Jun 05 21:02:19 cluster-gateway isc-dhcp-server[494]: Starting ISC DHCPv4 server: dhcpd.
Jun 05 21:02:19 cluster-gateway systemd[1]: Started isc-dhcp-server.service - LSB: DHCP server.

Next up I took one of the machines destined to be a worker node, connected that to the cluster gateway, and booted from my Ubuntu USB stick. Once booted up and in “Try Ubuntu” mode I fired up a terminal…

ubuntu@ubuntu:~$ ip address
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 44:8a:5b:dd:d8:ba brd ff:ff:ff:ff:ff:ff
    altname enp0s25
    inet 192.168.0.1/24 brd 192.168.0.255 scope global dynamic noprefixroute eno1
       valid_lft 506sec preferred_lft 506sec
    inet6 fe80::c32d:27c6:68e1:3b7a/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever

Hurrah! Its interface eno1 (the built in one - this machine only has that ethernet device) been allocated the first available IP address of 192.168.0.1 from the pool and brought this up. Let’s see what the routing table looks like:

ubuntu@ubuntu:~$ ip route
default via 192.168.0.254 dev eno1 proto dhcp metric 20100 
169.254.0.0/16 dev eno1 scope link metric 1000 
192.168.0.0/24 dev eno1 proto kernel scope link src 192.168.0.1 metric 100

Again, that looks right - the default route is via 192.168.0.254 which is the cluster gateway on that network - and subnet 192.168.0.0/24 is described as being reachable via the eno1 interface. This all makes sense. Finally, testing it for real, can I ping and even login to that gateway machine over the network?

ubuntu@ubuntu:~$ ping -n 1 -w 1 8.8.8.8
PING 8.8.8.8 (8.8.8.8) 56(124) bytes of data.

--- 8.8.8.8 ping statistics ---
1 packets transmitted, 0 received, 100% packet loss, time 0ms
ubuntu@ubuntu:~$ ssh gateway@192.168.0.254
gateway@192.168.0.254's password: 
Linux cluster-gateway 6.1.0-21-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.90-1 (2024-05-03) x86_64

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
Last login: Thu Jun  6 12:44:46 2024 from 192.168.0.1
gateway@cluster-gateway:~$ exit
logout
Connection to 192.168.0.254 closed.

Yes I can. So that’s the most fundamental stuff set up. On the other hand… can I ping my laptop that lives on the other subnet?

$ ping 192.168.1.100
PING 192.168.1.100 (192.168.1.100) 56(84) bytes of data.
^C
--- 192.168.1.100 ping statistics ---
6 packets transmitted, 0 received, 100% packet loss, time 5118ms

Nope. 100% packet loss. But that’s expected.

Network shall speak unto network

The problem is that the cluster gateway hasn’t been told to do anything with packets received from the cluster subnet that’s destined somewhere else. Then even if it had, my laptop doesn’t know what to do with packets destined for the cluster subnet. If we fix both of those we should be able to ping in either direction between the subnets.

Setting up the forwarding on the cluster gateway is amazingly simple. To do this temporarily:

$ sudo su -
$ echo 1 > /proc/sys/net/ipv4/ip_forward

This turns on forwarding for ipv4 packets between the networks attached to the device! Then on a machine (my laptop) on the general subnet, I add an entry to its routing table:

$ sudo route add -net 192.168.0.0/24 gw 192.168.1.157

This basically tells my latop to send packets destined for the subnet 192.168.0.0/24 (i.e. 192.168.0.x) to address 192.168.1.157 on the local subnet - i.e. to the cluster gateway.

With those changes made, my laptop can ping a worker node on the cluster subnet…

$ ping 192.168.0.1
PING 192.168.0.1 (192.168.0.1) 56(84) bytes of data.
64 bytes from 192.168.0.1: icmp_seq=1 ttl=63 time=3.48 ms
64 bytes from 192.168.0.1: icmp_seq=2 ttl=63 time=3.37 ms
64 bytes from 192.168.0.1: icmp_seq=3 ttl=63 time=3.25 ms
^C
--- 192.168.0.1 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2004ms
rtt min/avg/max/mdev = 3.252/3.365/3.477/0.091 ms

…and that worker node can ping my laptop…

ubuntu@ubuntu:~$ ping 192.168.1.100
PING 192.168.1.100 (192.168.1.100) 56(84) bytes of data.
64 bytes from 192.168.1.100: icmp_seq=1 ttl=63 time=118 ms
64 bytes from 192.168.1.100: icmp_seq=2 ttl=63 time=2.46 ms
^C
--- 192.168.1.100 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 2.458/60.294/118.131/57.836 ms

So to make that forwarding permanent on the cluster gateway, I added the following line to /etc/sysctl.conf:

net.ipv4.ip_forward=1 in /etc/sysctl.conf

After a reboot I verified that all this stuff was working ok. I didn’t make the new route to the cluster permanent on my laptop yet (I’ll wait and see if I find that useful later). For now it’s easy enough to add when I need that access.

What doesn’t work at this point, though, is communicating with anything outside my local networks.

For example, if I try to ping the Google DNS servers at 8.8.8.8 from a worker node on address 192.168.0.1, then my packet will be sent to the cluster gateway, the cluster gateway will forward it to the Telia router, the Telia router will then pretend that the packet came from its own public address and send it off to Google.

On the way back, though, when the packet returns to the Telia router, it will remember that it really originated from 192.168.0.1 … and at this point get stuck. The Telia router knows nothing about that subnet. It only knows how to send packets to its own private subnet of 192.168.1.0/24 or the public internet. Some other private subnet is a complete mystery that it can’t resolve. It also doesn’t provide any way to tell it about the cluster gateway in the way I could do on my laptop.

The solution is to use the same trick that the Telia router uses to hide its own private subnet from the broader internet to hide the cluster subnet from the Telia router. That trick is Network Address Translation (NAT) and is easy enough to set up.

To set it up temporarily on the cluster gateway…

$ sudo iptables -t nat -A POSTROUTING -o enp2s0 -j MASQUERADE --random

…adds NAT to any existing rules for manipulating IP packets in the gateway’s networking layer.

With this change pinging Google from the worker node now works:

ubuntu@ubuntu:~$ ping -c 1 -w 1 8.8.8.8
PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
64 bytes from 8.8.8.8: icmp_seq=1 ttl=55 time=2.59 ms

--- 8.8.8.8 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 2.588/2.588/2.588/0.000 ms

I was also able to use wget to download a file from an internet URL…

ubuntu@ubuntu:~$ wget paperstack.com
--2024-06-12 18:23:11--  http://paperstack.com/
Resolving paperstack.com (paperstack.com)... 52.85.243.6, 52.85.243.86, 52.85.243.76, ...
Connecting to paperstack.com (paperstack.com)|52.85.243.6|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://paperstack.com/ [following]
--2024-06-12 18:23:11--  https://paperstack.com/
Connecting to paperstack.com (paperstack.com)|52.85.243.6|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 18484 (18K) [text/html]
Saving to: ‘index.html’

index.html                                                           100%[=====================================================================================================================================================================>]  18.05K  --.-KB/s    in 0s      

2024-06-12 18:23:11 (99.9 MB/s) - ‘index.html’ saved [18484/18484]

ubuntu@ubuntu:~$ grep 'title' index.html
    <title>Home | Paperstack</title>

…and browse a web page in Firefox.

To make these NAT changes permanent the easiest way is to install the package iptables-persistent and that will prompt you to make the existing (temporary) iptables entry permanent by writing it to config files. If that’s already installed then you can also save the current iptables entries with the command:

$ sudo service netfilter-persistent save

The output from that should be similar to the following:

$ sudo service netfilter-persistent save
Saving netfilter rules...run-parts: executing /usr/share/netfilter-persistent/plugins.d/15-ip4tables save
run-parts: executing /usr/share/netfilter-persistent/plugins.d/25-ip6tables save
done.

After all this I rebooted all the machines and verified that the various pings and accesses were still working ok.

Files across the network

At this point with a DHCP server on the subnet¹ under my full control I was ready to try and configure the PXE (network) booting. This ran through pretty cleanly, although after each step I fired up a worker node (without the USB stick!) to make sure that I was getting appropriate output both on the BIOS boot messages and in the various logs. I’ll skip over some of those details.

Firstly, setting up DHCP to point the client at a TFTP server (which didn’t at this point exist), is a matter of making some more changes to the subnet entry in the /etc/dhcp/dhcpd.conf file:

subnet 192.168.0.0 netmask 255.255.255.0 {
  range 192.168.0.1 192.168.0.253;
  option routers 192.168.0.254;
  filename "pxelinux.0";
  next-server 192.168.0.254;
}

Here the next-server entry tells the booting client what the subnet IP address of the TFTP server is (again the cluster gateway is taking this role), and then the filename entry tells it to download and then boot the pxelinux.0 binary.

After restarting the dhcp service or rebooting the cluster gateway, a new worker node on boot should report that it knows the address of the TFTP server - albeit it won’t be able to boot because it won’t find any boot files there or an active TFTP service! Note also that you must have included the network device configured in the worker node’s BIOS boot sequences.

Next up was setting up a TFTP server on the cluster gateway.

$ sudo apt install tftpd-hpa
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Suggested packages:
  pxelinux
The following NEW packages will be installed:
  tftpd-hpa
0 upgraded, 1 newly installed, 0 to remove and 0 not upgraded.
Need to get 41.9 kB of archives.
After this operation, 117 kB of additional disk space will be used.
Get:1 http://deb.debian.org/debian bookworm/main amd64 tftpd-hpa amd64 5.2+20150808-1.4 [41.9 kB]
Fetched 41.9 kB in 0s (754 kB/s)
Preconfiguring packages ...
Selecting previously unselected package tftpd-hpa.
(Reading database ... 34742 files and directories currently installed.)
Preparing to unpack .../tftpd-hpa_5.2+20150808-1.4_amd64.deb ...
Unpacking tftpd-hpa (5.2+20150808-1.4) ...
Setting up tftpd-hpa (5.2+20150808-1.4) ...
Processing triggers for man-db (2.11.2-2) ...

Then looking at the config file for the TFTP service…

$ cat /etc/default/tftpd-hpa
# /etc/default/tftpd-hpa

TFTP_USERNAME="tftp"
TFTP_DIRECTORY="/srv/tftp"
TFTP_ADDRESS=":69"a
TFTP_OPTIONS="--secure"

This will serve any files that are placed under the /srv/tftp path.

To test that the tftp server is working, I installed the tftp client:

$ sudo apt install tftp-hpa

Then copied a file into the /srv/tftp directory and made sure I could retrieve it:

$ sudo su -
$ echo "hello" > /srv/tftp/readme.txt
$ exit
logout
$ cd
$ tftp localhost
tftp> get readme.txt
tftp> quit
$ cat readme.txt
hello

Incidentally if the local file is not writable you’ll get “permission denied” when you try to retrieve the file - this is at the client end, not a problem with the server (I encountered this silly issue when I tried to retrieve the file while still in the /srv/tftp directory but no longer logged in as root - that is, accidentally trying to overwrite it and couldn’t because it was owned by root and I wasn’t logged in as root any more! That’s a horrible sentence, but it will probably make sense if you’ve just done this yourself!

Next let’s write something that we can boot from into that directory. There’s a network bootable Debian installer for Bookworm (the same version of Debian running on the cluster gateway) available as a tar.gz file and I unpacked that into the /srv/tftp directory:

$ sudo su -
$ cd /srv/tftp
$ sudo wget https://deb.debian.org/debian/dists/bookworm/main/installer-amd64/current/images/netboot/netboot.tar.gz
$ tar -xvf netboot.tar.gz

… and then restarted the worker node, again with no USB drive or anything in it - just the network connection. Behold:

So now the essentials are there to get an operating system onto the worker nodes automatically.

Getting ready for the next steps

The last step in this part of the cluster gateway configuration was to set up a Network File Service (NFS) share - when I’m booting my worker nodes I want them to use NFS for their mounted filesystems. This way they don’t need a physical disk individually installed. It will probably be a bit slow, but speed is not of the essence for this underpowered toy cluster. I do have a Synology NAS on my main subnet that’s capable of serving NFS and I might switch to using that later on (as the cluster gateway machine only has a very small SSD installed in it). However I wanted to minimise any network weirdness that migt ensue from trying to network mount a filesystem across the cluster gateway bridge initially.

I’d not set up NFS before and I was pleasantly surprised by how easy it was. Installing the server service:

$ sudo apt install nfs-kernel-server
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following NEW packages will be installed:
  nfs-kernel-server
0 upgraded, 1 newly installed, 0 to remove and 0 not upgraded.
Need to get 151 kB of archives.
After this operation, 626 kB of additional disk space will be used.
Get:1 http://deb.debian.org/debian bookworm/main amd64 nfs-kernel-server amd64 1:2.6.2-4 [151 kB]
Fetched 151 kB in 0s (5,146 kB/s)         
Selecting previously unselected package nfs-kernel-server.
(Reading database ... 34951 files and directories currently installed.)
Preparing to unpack .../nfs-kernel-server_1%3a2.6.2-4_amd64.deb ...
Unpacking nfs-kernel-server (1:2.6.2-4) ...
Setting up nfs-kernel-server (1:2.6.2-4) ...
nfs-mountd.service is a disabled or a static unit not running, not starting it.
nfsdcld.service is a disabled or a static unit not running, not starting it.
Processing triggers for man-db (2.11.2-2) ...

Creating a directory that to be shared via NFS:

sudo mkdir /clients

Add a line to the NFS export configuration file /etc/exports

/clients 192.168.0.0/255.255.255.0(rw,no_root_squash,subtree_check)

That declares that any host on the 192.168.0.0/24 (via a bitmask) can mount the filesystem as writable. The next command makes that take effect:

sudo exportfs -a

Then to reload the NFS service to make this visible from client machines:

sudo systemctl restart nfs-mountd.service

After that I wrote a “readme.txt file under /clients so I would be able to verify that the network files were working ok.

sudo su -
uname -a > /clients/readme.txt

With that I was ready for booting up a worker node from the Ubuntu thumb drive² in the Try Ubuntu mode and then adding the nfs tools to test the new nfs mount…

ubuntu@ubuntu:~$ sudo apt install nfs-common
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following additional packages will be installed:
  libevent-core-2.1-7 rpcbind
Suggested packages:
  open-iscsi watchdog
The following NEW packages will be installed:
  libevent-core-2.1-7 nfs-common rpcbind
0 upgraded, 3 newly installed, 0 to remove and 234 not upgraded.
Need to get 381 kB of archives.
After this operation, 1,328 kB of additional disk space will be used.
Do you want to continue? [Y/n] Y
Get:1 http://archive.ubuntu.com/ubuntu jammy/main amd64 libevent-core-2.1-7 amd64 2.1.12-stable-1build3 [93.9 kB]
Get:2 http://archive.ubuntu.com/ubuntu jammy/main amd64 rpcbind amd64 1.2.6-2build1 [46.6 kB]
Get:3 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 nfs-common amd64 1:2.6.1-1ubuntu1.2 [241 kB]
Fetched 381 kB in 1s (471 kB/s)   
Selecting previously unselected package libevent-core-2.1-7:amd64.
(Reading database ... 208650 files and directories currently installed.)
Preparing to unpack .../libevent-core-2.1-7_2.1.12-stable-1build3_amd64.deb ...
Unpacking libevent-core-2.1-7:amd64 (2.1.12-stable-1build3) ...
Selecting previously unselected package rpcbind.
Preparing to unpack .../rpcbind_1.2.6-2build1_amd64.deb ...
Unpacking rpcbind (1.2.6-2build1) ...
Selecting previously unselected package nfs-common.
Preparing to unpack .../nfs-common_1%3a2.6.1-1ubuntu1.2_amd64.deb ...
Unpacking nfs-common (1:2.6.1-1ubuntu1.2) ...
Setting up rpcbind (1.2.6-2build1) ...
Created symlink /etc/systemd/system/multi-user.target.wants/rpcbind.service → /l
ib/systemd/system/rpcbind.service.
Created symlink /etc/systemd/system/sockets.target.wants/rpcbind.socket → /lib/s
ystemd/system/rpcbind.socket.
Setting up libevent-core-2.1-7:amd64 (2.1.12-stable-1build3) ...
Setting up nfs-common (1:2.6.1-1ubuntu1.2) ...

Creating config file /etc/idmapd.conf with new version

Creating config file /etc/nfs.conf with new version
Adding system user `statd' (UID 130) ...
Adding new user `statd' (UID 130) with group `nogroup' ...
Not creating home directory `/var/lib/nfs'.
Created symlink /etc/systemd/system/multi-user.target.wants/nfs-client.target → 
/lib/systemd/system/nfs-client.target.
Created symlink /etc/systemd/system/remote-fs.target.wants/nfs-client.target → /
lib/systemd/system/nfs-client.target.
auth-rpcgss-module.service is a disabled or a static unit, not starting it.
nfs-idmapd.service is a disabled or a static unit, not starting it.
nfs-utils.service is a disabled or a static unit, not starting it.
proc-fs-nfsd.mount is a disabled or a static unit, not starting it.
rpc-gssd.service is a disabled or a static unit, not starting it.
rpc-statd-notify.service is a disabled or a static unit, not starting it.
rpc-statd.service is a disabled or a static unit, not starting it.
rpc-svcgssd.service is a disabled or a static unit, not starting it.
rpc_pipefs.target is a disabled or a static unit, not starting it.
var-lib-nfs-rpc_pipefs.mount is a disabled or a static unit, not starting it.
Processing triggers for man-db (2.10.2-1) ...
Processing triggers for libc-bin (2.35-0ubuntu3.6) ...

Then mounting the NFS share and checking that I can read the contents of that readme.txt file:

ubuntu@ubuntu:~$ sudo mount -t nfs 192.168.0.254:/clients /clients
ubuntu@ubuntu:~$ ls /clients
readme.txt
ubuntu@ubuntu:~$ cat /clients/readme.txt
Linux cluster-gateway 6.1.0-21-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.90-1 (2024-05-03) x86_64 GNU/Linux

Ok, so far so good, NFS is up and running. Now I “just” have to figure out how to

The next steps in part 3 are to get a worker node booting into a Debian install where the filesystem is mounted on the NFS share. This requires some fiddling with the contents of the TFTP server and using the Debootstrap tool to create a suitable Debian installation filesystem.

Part 3 - PXE Booting Debian with an NFS Root Filesystem

Some more handy resources

Footnotes

¹ The PXE network booting process requires the TFTP server to be on the local subnet - in part because it has no security to speak of.

² *Not* the PXE boot - if you have network booting taking precedence over USB booting then the Lenovos will drop to a boot selection screen if you hit key F12 during boot