Ceph Cluster with Raspberry Pi 5 and NVMe SSDs

As described in an earlier article, I run a Kubernetes cluster on my Proxmox VE hypervisor using Talos Linux. For this Kubernetes cluster I built a Ceph cluster as a storage solution. This time I did not want to do this with virtual machines using my Hypervisor. I choose to set it up on bare metal in order to learn a bit how Ceph deals with hardware and different storage devices. Doing this, I ran into some bugs during Ceph installation and provisioning the Ceph CSI on the Kubernetes cluster. So I decided to share my experience with it. I might help someone out there and save him some time.

Proxmox VE hypervisor offers its own Ceph cluster solution. But this is geared towards providing storage pools for virtual machines on that hypervisor. It also just deploys one Ceph monitor per hypervisor host. The Ceph monitor maintains a map of the state of the cluster. You need at least three monitors in order to be redundant and highly available. So if you have only one Promox VE server running (like me), this is not a good option. Plus that Ceph installation is tightly coupled and intermingled with Proxmox VE. So tinkering with that configuration and cluster authentication was causing problems when I tried it out.

Hardware

I choose to use Raspberry Pi 5 as a platform because they have enough computing power and RAM for a small lab cluster with little usage/traffic. Since a while these add on HATs for NVMe SSDs are available for it, so I thought that could be a speedy and affordable way to handle storage. I basically threw all of this then in a 10'' desktop rack together with a POE switch.

Ceph Cluster in 10'' rack

Parts list:

Installing Ubuntu on Raspberry Pis

Checking on the OS recommendations for Ceph I learned that Ubuntu should be a good option: Ubuntu 22.04 was tested with Ceph v18.2. The problem is that for Raspberry Pi 5 only Ubuntu 24.04 is available. Ubuntu 22.04 will not be backported for Raspberry Pi 5. So I flashed Ubuntu 24.04 onto the micro SD cards using the Raspberry Pi imager. I found it quite usefully to use the customization options of Raspberry Pi imager and provision hostnames and SSH keys when flashing the images.

The Cephadm Bug in v19.2.0

The recommended installation method is nowadays cephadm. I noticed that only Docker was missing from requirements after installing Ubuntu. So I quickly installed it on all nodes:

apt update
apt install docker.io

You need to pick one of the Raspberry Pis to be your admin node. Then you just install cephadm on that node. I choose to install the ubuntu package for simplicity:

apt install cephadm

That installs the v19.2.0 version on Ubuntu 24.04 currently. But that version is not tested with this Ubuntu version (see above) and this is a problem. 😀 So there is a bug in the v19.2.0 cephadm version, where it is not able to parse AppArmor profiles in /sys/kernel/security/apparmor/profiles. This will cause you a multitude of problems: storage devices will not become discovered, cluster communication is flawed etc.. Please check on the Stackoverflow thread for more details. The fix is already on its way for cephadm v19.2.1.

Besides cephadm I choose not to install additional ceph packages (i.e., ceph, ceph-volume), as you can run almost any admin command with cephadm shell.

The Workaround

As this problem is caused by the MongoDB Compass profile containing spaces in the name, I choose to disable that Apparmor profile as workaround. I am not using MongoDB on the machines, so this should not be a problem. You need to apply the workaround on all Raspberry Pis:

sudo ln -s /etc/apparmor.d/MongoDB_Compass /etc/apparmor.d/disable/
sudo apparmor_parser -R /etc/apparmor.d/MongoDB_Compass
sudo systemctl reload apparmor.service

Check that the MongoDB_Compass profile became disabled:

sudo systemctl status apparmor.service
sudo cat /sys/kernel/security/apparmor/profiles | grep MongoDB

Boostrap Ceph cluster

Bootstrapping the Ceph Cluster with cephadm on your admin node is straight forward, the IP address is the one of your admin node:

cephadm bootstrap --mon-ip 192.168.1.100

Adding Hosts

My node layout looks like this:

Cephadm needs a user which a certain set of permissions. By default the root user is used, but you can also configure a different user with narrowed down permissions. I am just going with the default here.

In order to add the other two nodes, you need to configure the sshd in /etc/ssh/sshd_config and permit root login: PermitRootLogin yes or better PermitRootLogin prohibit-password. After that, you can copy over the SSH keys from the admin node like that:

ssh-copy-id -f -i /etc/ceph/ceph.pub root@ceph2
ssh-copy-id -f -i /etc/ceph/ceph.pub root@ceph3

Add the hosts like this:

sudo cephadm shell ceph orch host add ceph2 192.168.1.101
sudo cephadm shell ceph orch host add ceph3 192.168.1.102

On these two nodes Ceph will add a monitor each, so you then have in total three monitors and a highly available Ceph cluster.

Adding Devices and Object Storage Daemons (OSD)

Please be aware that you can only add new devices if these requirements are met:

So you better check if this is actually the case:

sudo cephadm shell ceph orch device ls --wide --refresh

If there is a problem, you will see that listed in the REJECT REASONS column. You might need to follow up on that by using fdisk to remove partitions or using ceph-volume lvm zap to clean up devices. I had no problems adding my three NVMe SSDs on the three Raspberry Pis.

If everything looks fine, you can just add all storage devices and create the ODS in one go:

sudo cephadm shell ceph orch apply osd --all-available-devices

Checking Result

Everything should be up and running by now. You can check the results in the web dashboard which runs on the admin node: https://ceph1:8443/

Ceph Cluster in 10'' rack

Self Critics

Working on the project, I noticed already some things I need to improve. Having the operating system running on the SD cards was probably not the best idea. Especially the monitors will cause some serious wear and tear there. So I probably need to change that at some point in the future. So I already eyeballed other PCIe to M.2 adapters which support more than one SSD like this one. That way I could have one SSD for the operating system and one for storage. I admit that the managed 150W POE switch is quite overkill for that project. But I choose that one because I also want to use it for other lab projects in the future. There are many cheaper options out there. For instance, Waveshare is offering this cheap 120W POE switch which would fully suffice for that project. Also, that DeskPi Rackmount is rather pricy for what it offers. So if you find cheaper options, I would rather go with this.

Outlook

I will do a follow-up post on that topic for configuring the Ceph CSI for the Kubernetes cluster as this proved not to be that straight forward. I already threw in some hard drives in my Proxmox machine and plan to set up another Ceph cluster there. I am interested to do some performance comparison between this two Ceph clusters.