Posts

Hifi linstor controller

Highly available LINSTOR Controller with Pacemaker

Part of the design of LINSTOR is that if the central LINSTOR Controller goes down, all the storage still remains up and accessible. This should allow ample time to repair the downed system hosting the LINSTOR Controller. Still, in the majority of cases, it is preferred to run the LINSTOR Controller in a container within your cloud or as a VM in your hypervisor platform. However, there may exist a situation where you want to keep the LINSTOR Controller up and highly available, but do not have a container or VM platform in place to rely upon.  For situations like this we can easily leverage DRBD and the Pacemaker/Corosync stack.

If familiar with Pacemaker, setting up a clustered LINSTOR Controller should seem pretty straightforward. The only really tricky bit here is that we first need to install LINSTOR to create the DRBD storage that will provide the storage for LINSTOR. Sounds a little bit chicken-and-egg, I know, but this allows LINSTOR to be aware of, and manage, all DRBD resources.

The below example is for only two nodes, but it could be easily adapted for more nodes. Make sure to install both the LINSTOR Controller and LINSTOR Satellite software to both nodes. The below instructions are by no means a step-by-step guide, but rather just the “special sauce” needed for a HA LINSTOR Controller cluster.

If using the LINBIT provided package repositories an Ansible playbook is available to entirely automate the deployment of this cluster on a RHEL7 or CentOS7 system.

Create a DRBD resource for the LINSTOR database

We’ll name this resource linstordb, and use the already already configured pool0 storage pool.

[[email protected] ~]# linstor resource-definition create linstordb
[[email protected] ~]# linstor volume-definition create linstordb 250M
[[email protected] ~]# linstor resource create linstora linstordb --storage-pool pool0
[[email protected] ~]# linstor resource create linstorb linstordb --storage-pool pool0

Stop the LINSTOR Controller and move the database to the DRBD device

Move the database temporarily, mount the DRBD device where LINSTOR expects the database, and move it back.

[[email protected] ~]# systemctl stop linstor-controller
[[email protected] ~]# rsync -avp /var/lib/linstor /tmp/
[[email protected] ~]# mkfs.xfs /dev/drbd/by-res/linstordb/0
[[email protected] ~]# rm -rf /var/lib/linstor/*
[[email protected] ~]# mount /dev/drbd/by-res/linstordb/0 /var/lib/linstor
[[email protected] ~]# rsync -avp /tmp/linstor/ /var/lib/linstor/

Cluster everything up in Pacemaker

Please note that we strongly encourage you utilize tested and working STONITH in all Pacemaker cluster. This example omits it simply because these VMs did not have any fencing devices available.

primitive p_drbd_linstordb ocf:linbit:drbd \
        params drbd_resource=linstordb \
        op monitor interval=29 role=Master \
        op monitor interval=30 role=Slave \
        op start interval=0 timeout=240s \
        op stop interval=0 timeout=100s
primitive p_fs_linstordb Filesystem \
        params device="/dev/drbd/by-res/linstordb/0" directory="/var/lib/linstor" \
        op start interval=0 timeout=60s \
        op stop interval=0 timeout=100s \
        op monitor interval=20s timeout=40s
primitive p_linstor-controller systemd:linstor-controller \
        op start interval=0 timeout=100s
        op stop interval=0 timeout=100s
        op monitor interval=30s timeout=100s
ms ms_drbd_linstordb p_drbd_linstordb \
        meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true
group g_linstor p_fs_linstordb p_linstor-controller
order o_drbd_before_linstor inf: ms_drbd_linstordb:promote g_linstor:start
colocation c_linstor_with_drbd inf: g_linstor ms_drbd_linstordb:Master
property cib-bootstrap-options: \
        stonith-enabled=false \
        no-quorum-policy=ignore

We still usually advise leveraging the features already built into your cloud or VM platform for high availability if one is available, but if not, you can always the above to leverage pacemaker to make your LINSTOR Controller highly available.

 

Devin Vance on Linkedin
Devin Vance
First introduced to Linux back in 1996, and using Linux almost exclusively by 2005, Devin has years of Linux administration and systems engineering under his belt. He has been deploying and improving clusters with LINBIT since 2011. When not at the keyboard, you can usually find Devin wrenching on an american motorcycle or down at one of the local bowling alleys.

A Highly Available LINSTOR Controller for Proxmox

For the High Availability setup we describe in this blog post, we assume that you installed LINSTOR and the Proxmox Plugin as described in the Proxmox section of the users guide or our blog post.

The idea is to execute the LINSTOR controller within a VM that is controlled by Proxmox and its HA features, where the storage resides on DRBD, managed by LINSTOR itself.

Preparing the Storage

The first step is to allocate storage for the VM by creating a VM and selecting “Do not use any media” on the “OS” section. The hard disk should reside on DRBD (e.g., “drbdstorage”). Disk space should be at least 2GB, and for RAM we chose 1GB. These are the minimal requirements for the appliance LINBIT provides to its customers (see below). If you set up your own controller VM, or resources are not constrained, increase these minimal values. In the following, we assume that the controller VM was created with ID 100, but it is fine if this VM is created later (after you have already created other VMs).

LINSTOR Controller Appliance

LINBIT provides an appliance for its customers that can be used to populate the created storage. For the appliance to work, we first create a “Serial Port.” First, click on “Hardware” and then on “Add” and finally on “Serial Port.” See image below:

proxmox_serial1_controller_vm

If everything worked as expected, the VM definition should then look like this:

proxmox_add_serial2_controller_vm

The next step is to copy the VM appliance to the created storage. This can be done with qemu-img. Make sure to replace the VM ID with the correct one:

# qemu-img dd -O raw if=/tmp/linbit-linstor-controller-amd64.img \
 of=/dev/drbd/by-res/vm-100-disk-1/0

After that, you can start the VM and connect to it via the Proxmox VNC viewer. The default user name and password are both “linbit”. Note that we kept the defaults for SSH, so you will not be able to log in to the VM via SSH and username/password. If you want to enable that (and/or “root” login), enable these settings in /etc/ssh/sshd_config and restart the ssh service. As this VM is based on “Ubuntu Bionic”, you should change your network settings (e.g., static IP) in /etc/netplan/config.yaml. After that you should be able to ssh to the VM:

proxmox_ssh_controller_vm

Adding the Controller VM to the existing Cluster

In the next step, you add the controller VM to the existing cluster:

# linstor node create --node-type Controller \
 linstor-controller 10.43.7.254

As this special VM will be not be managed by the Proxmox Plugin, make sure all hosts have access to that VM’s storage. In our test cluster, we checked the linstor resource list to confirm where the storage was already deployed and then created further assignments via linstor resource create. In our lab consisting of four nodes, we made all resource assignments diskful, but diskless assignments are fine as well. As a rule of thumb keep the redundancy count at “3” (more usually does not make sense), and assign the rest diskless.

As the storage for this particular VM has to be made available (i.e., drbdadm up), enable the drbd.service on all nodes:

# systemctl enable drbd
# systemctl start drbd

At startup, the `linstor-satellite` service deletes all of its resource files (*.res) and regenerates them. This conflicts with the drbd services that needs these resource files to start the controller VM. It is good enough to first bring up the resources via drbd.service and then start linstor-satellite.service. To make the necessary changes, you need to create a drop-in for the linstor-satellite.service via systemctl (do
not edit the file directly).

# systemctl edit linstor-satellite
[Unit]
After=drbd.service

Switching to the New Controller

Now, it is time for the final steps — namely switching from the existing controller to the new one in the VM. Stop the old controller service on the old host, and copy the LINSTOR controller database to the VM:

# systemctl stop linstor-controller
# systemctl disable linstor-controller
# scp /var/lib/linstor/* [email protected]:/var/lib/linstor/

Finally, we can enable the controller in the VM:

# systemctl start linstor-controller # in the VM
# systemctl enable linstor-controller # in the VM

To check if everything worked as expected, you can query the cluster nodes on a host by asking the controller in the VM: linstor --controllers=10.43.7.254 node list. It is perfectly fine that the controller (which is just a controller and not “combined”) is shown as “OFFLINE”. Still, this might change in the future to something more appropriate.

As the last – but crucial – step, you need to add the “controllervm” option to /etc/pve/storage.cfg, and change the controller IP:

drbd: drbdstorage
  content images,rootdir
  redundancy 3
  controller 10.43.7.254
  controllervm 100

By setting the “controllervm” parameter the plugin will ignore (or act accordingly) if there are actions on the controller VM. Basically, this VM should not be managed by the plugin, so the plugin mainly ignores all actions on the given controller VM ID. However, there is one exception. When you delete the VM in the GUI, it is removed from the GUI. We did not find a way to return/kill it in a way that would keep the VM in the GUI. Yet such requests are ignored by the plugin, so the VM will not be deleted from the LINSTOR cluster. Therefore, it is possible to later create a VM with the ID of the old controller. The plugin will just return “OK”, and the old VM with the old data can be used again. To keep it simple, be careful to not delete the controller VM.

Enabling HA for the Controller VM in Proxmox

Currently, we have the controller executed as VM, but we should make sure that one instance of the VM is started at all times. For that we use Proxmox’s HA feature. Click on the VM; then on “More”; and then on “Manage HA.” We set the following parameters for our controller VM:

promox_manage_ha_controller_vm

Final Considerations

As long as there are surviving nodes in your Proxmox cluster, everything should be fine. In case the node hosting the controller VM is shut down or lost, Proxmox HA will make sure the controller is started on another host. The IP of the controller VM should not change. It is up to you as admin to make sure this is the case (e.g., setting a static IP, or always providing the same IP via dhcp on the bridged interface).

One limitation that is not fully handled with this setup is a total cluster outage (e.g., common power supply failure) with a restart of all cluster nodes. Proxmox is unfortunately pretty limited in that regard. You can enable the “HA Feature” for a VM, and you can define “Start and Shutdown Order” constraints. But both are completely separated from each other. Therefore it is difficult to ensure that the controller VM is up and all other VMs are started.

It might be possible to work around that by delaying VM startup in the Proxmox plugin until the controller VM is up (i.e., if the plugin is asked to start the controller VM it does it, otherwise it waits and pings the controller). While this is a nice idea, it would be a huge failure in a serialized, non-concurrent VM start/plugin call event stream where some VM should be started (which then blocks) before the controller VM is scheduled to be started. That would obviously result in a deadlock.

We will discuss options with Proxmox, but we think the presented solution is valuable in typical use cases as is, especially compared to the complexity of a Pacemaker setup. Use cases where one can expect that not the whole cluster goes down at the same time are (will be??) covered. And even if that is the case, only automatic startup of the VMs would not work when the whole cluster is started. In such a scenario, the admin just has to wait until the Proxmox HA service starts the controller VM. After that, all VMs can be started manually/scripted on the command line.

Roland Kammerer
Software Engineer at Linbit
Roland Kammerer studied technical computer science at the Vienna University of Technology and graduated with distinction. Currently, he is a PhD candidate with a research focus on time-triggered realtime-systems and works for LINBIT in the DRBD development team.