Posts

Plugin for Linstor with OpenNebula

How to Setup LINSTOR with OpenNebula

This post will guide you through the setup of the Linstor - OpenNebula Addon. After completing it you will be able to easily live-migrate virtual machines between OpenNebula nodes, and additionally, have data redundancy.

Setup Linstor with OpenNebula

This post assumes that you already have OpenNebula installed and running on all of your nodes. At first I will give you a quick guide for installing LINSTOR, for a more detailed documentation please read the DRBD User’s Guide. The second part will show you how to add a LINSTOR image and system datastore to OpenNebula.

We will assume the following node setup:

Node name IP Role
alpha 10.0.0.1 Controller/ON front-end
bravo 10.0.0.2 Virtualization host
charlie 10.0.0.3 Virtualization host
delta 10.0.0.4 Virtualization host

Make sure you have configured 2 lvm-thin storage pools named linstorpool/thin_image and linstorpool/thin_system on all of your nodes.

LINSTOR setup

Install LINSTOR packages

The easiest setup is to install the linstor-controller on the same node as the OpenNebula cloud front-end. The linstor-opennebula package contains our OpenNebula driver, and therefore, is essential on the OpenNebula cloud front-end node. On this node install the following packages:

apt install drbd-dkms drbd-utils python-linstor linstor-satellite linstor-client linstor-controller linstor-opennebula

After the installation completes start the linstor-controller and enable the service:

systemctl start linstor-controller
systemctl enable linstor-controller

On all other virtualization nodes you do not need the linstor-controllerlinstor-client or linstor-opennebula package:

apt install drbd-dkms drbd-utils python-linstor linstor-satellite

For all nodes (including the controller) you have to start and enable the linstor-satellite:

systemctl start linstor-satellite
systemctl enable linstor-satellite

Now all LINSTOR-related services should be running.

Adding and configuring LINSTOR nodes

All nodes that should work as virtualization nodes need to be added to LINSTOR, so that storage can be distributed and activated on all nodes:

linstor node create alpha 10.0.0.1 --node-type Combined
linstor node create bravo 10.0.0.2
linstor node create charlie 10.0.0.3
linstor node create delta 10.0.0.4

Now we will configure the system and image lvm-thin pools with LINSTOR:

linstor storage-pool create lvmthin alpha open_system linstorpool/thin_system
linstor storage-pool create lvmthin bravo open_system linstorpool/thin_system
linstor storage-pool create lvmthin charlie open_system linstorpool/thin_system
linstor storage-pool create lvmthin delta open_system linstorpool/thin_system

linstor storage-pool create lvmthin alpha open_image linstorpool/thin_image
linstor storage-pool create lvmthin bravo open_image linstorpool/thin_image
linstor storage-pool create lvmthin charlie open_image linstorpool/thin_image
linstor storage-pool create lvmthin delta open_image linstorpool/thin_image

For testing we can now try to create a dummy test resource:

linstor resource-definition create dummy
linstor volume-definition create dummy 10M
linstor resource create dummy --auto-place 3 -s open_image

If everything went fine with the above commands you should be able to see a resource created on 3 nodes using our default lvm-thin storage pool:

linstor resource list-volumes

Now we can delete the created dummy resource:

linstor resource-definition delete dummy

LINSTOR is now setup and ready to be used by OpenNebula.

OpenNebula LINSTOR datastores

OpenNebula uses different types of datastores: system, image and files.

LINSTOR supports the system and image datastore types.

  • System datastore is used to store a small context image that stores all information needed to run a virtual machine (VM) on a node.
  • Image datastore as it name reveals stores VM images.

OpenNebula doesn’t need to be configured with a LINSTOR system datastore; it will also work with its default system datastore, but using LINSTOR system datastore gives it some data redundancy advantages.

Setup LINSTOR datastore drivers

As LINSTOR is an addon driver for OpenNebula, the LINSTOR OpenNebula driver needs to be added to it, to do so you need to modify the /etc/one/oned.conf and add linstor to the TM_MAD and DATASTORE_MAD sections.

TM_MAD = [
  executable = "one_tm",
  arguments = "-t 15 -d dummy,lvm,shared,fs_lvm,qcow2,ssh,vmfs,ceph,linstor"
]

Note that for the DATASTORE_MAD section the linstor driver has to specified 2 times (image datastore and system datastore).

DATASTORE_MAD = [
    EXECUTABLE = "one_datastore",
    ARGUMENTS  = "-t 15 -d dummy,fs,lvm,ceph,dev,iscsi_libvirt,vcenter,linstor -s shared,ssh,ceph,fs_lvm,qcow2,vcenter,linstor"
]

And finally at the end of the configuration file, add new TM_MAD_CONF and DS_MAD_CONF sections for the linstor driver:

TM_MAD_CONF = [
    name = "linstor", ln_target = "NONE", clone_target = "SELF", shared = "yes", ALLOW_ORPHANS="yes"
]

DS_MAD_CONF = [
    NAME = "linstor", REQUIRED_ATTRS = "BRIDGE_LIST", PERSISTENT_ONLY = "NO", MARKETPLACE_ACTIONS = "export"
]

Now restart the OpenNebula service.

Adding LINSTOR datastore drivers

After we registered the LINSTOR driver with OpenNebula we can add the image and system datastore.

For the system datastore we will create a configuration file and add it with the onedatastore tool. If you want to use more than 2 replicas, just edit the LINSTOR_AUTO_PLACE value.

cat >system_ds.conf <<EOI
NAME = linstor_system_auto_place
TM_MAD = linstor
TYPE = SYSTEM_DS
LINSTOR_AUTO_PLACE = 2
LINSTOR_STORAGE_POOL = "open_system"
BRIDGE_LIST = "alpha bravo charlie delta"
EOI

onedatastore create system_ds.conf

And we do nearly the same for the image datastore:

cat >image_ds.conf <<EOI
NAME = linstor_image_auto_place
DS_MAD = linstor
TM_MAD = linstor
TYPE = IMAGE_DS
DISK_TYPE = BLOCK
LINSTOR_AUTO_PLACE = 2
LINSTOR_STORAGE_POOL = "open_image"
BRIDGE_LIST = "alpha bravo charlie delta"
EOI

onedatastore create image_ds.conf

Now you should see 2 new datastores in the OpenNebula web front-end that are ready to use.

Usage and Notes

The new datastores can be used in the usual OpenNebula datastore selections and should support all OpenNebula features.

The LINSTOR datastores have also some configuration options that are described on the drivers github repository page.

Data distribution

The interested reader can check which ones were selected via LINSTOR resource list.

linstor resource list

While interesting, it is important to know that the storage can be accessed by all nodes in the cluster via a DRBD feature called “diskless clients”. So let’s assume “alpha” and “bravo” had the most free space and were selected, and the VM was created on node “bravo”. Via the low level tool drbdadm status we now see that the resource is created on two nodes (i.e., “alpha” and “bravo”) and the DRBD resource is in “Primary” role on “bravo”.

Now we want to migrate the VM from “bravo” to node “charlie”. This is again done via a few clicks in the GUI, but the interesting steps happen behind the scenes: The storage plugin realizes that it has access to the data on “alpha” and “bravo” (our two replicas) but also needs access on “charlie” to execute the VM. The plugin therefore creates a diskless assignment on “charlie”. When you execute drbdadm status on “charlie”, you see that now three nodes are involved in the overall picture:

  • Alpha with storage in Secondary role
  • Bravo with storage in Secondary role
  • Charlie as a diskless client in Primary role

Diskless clients are created (and deleted) on demand without further user interaction, besides moving around VMs in the GUI. This means that if you now move the VM back to “bravo”, the diskless assignment on “charlie” gets deleted as it is no longer needed.

If you would have moved the VM from “charlie” to “delta”, the diskless assignment for “charlie” would have been deleted, and a new one for “delta” would have been created.

For you it is probably even more interesting that all of this including VM migration happens within seconds without moving the actual replicated storage contents.

Check this for LINSTOR and OpenNebula:

 

 

Rene Peinthor
Software developer
Rene was one of the first developers seeing a DRBD resource deployed by LINSTOR and is software developer at LINBIT since 2017.
While not squashing bugs in LINSTOR, Rene is either climbing or paragliding down a mountain.

LINBIT SDS Adds Disaster Recovery and Support for Kubernetes

LINBIT SDS (Linux SDS) will showcase cloud-native enterprise storage management at KubeCon + CloudNativeCon in Seattle

Beaverton, OR, Dec. 3, 2018 – LINBIT enhances open source software-defined storage (SDS) by providing disaster recovery (DR) replication for critical data. LINBIT SDS is an enterprise-class storage management solution designed for cloud and container storage workloads.

To simplify administration, enhance user experience, and accelerate integration with other software, LINBIT SDS relies on the pre-existing storage management capabilities native to Linux, such as LVM and DRBD. These capabilities are complemented by LINSTOR, a feature-rich volume management software. One supported storage tool is DRBD, the in-kernel block level data-replication for Linux. By announcing support for DRBD Proxy, LINBIT extends replication to disaster recovery scenarios since DRBD Proxy enables fast and reliable data replication over any distance by resolving network communications and handling data access latencies.

“LINBIT SDS is rapidly becoming the reliable, high performance, and economical choice for enterprise and cloud workloads,” said Brian Hellman COO of LINBIT. “With simplified support for DR, LINBIT SDS is surpassing the costly and complex proprietary cloud storage solutions.”

LINBIT SDS provides a host of capabilities to manage persistent block storage for Kubernetes environments. It supports logical volume management (LVM) snapshots, which enhance application availability while minimizing data loss; thin provisioning, which improves efficient resource utilization in virtualized environments; and volume management, which simplifies tasks such as adding, removing, or replicating storage volumes.

LINBIT SDS supports Kubernetes

The Linux based SDS solution works with the leading cloud projects Kubernetes, OpenStack, and OpenNebula, as well as a range of virtualization platforms, and as a stand-alone product. Learn more about how the software works by watching a short video-demo here:

Persistent Kubernetes Storage for Databases (MySQL) with LINSTOR and DRBD (Demo)

OpenStack Cinder: Open-Source Volume Management with LINSTOR and DRBD

Linux Disaster Recovery Replication with DRBD Proxy (Demo)

Kubernetes + LINSTOR Container Failover (Demo)

LINBIT is a member of the Linux Foundation and is proud to support KubeCon, a Linux Foundation conference. Visit us at KubeCon + CloudNativeCon at booth #S7, December 11th-13th, 2018 in Seattle.

ABOUT LINBIT

LINBIT is the force behind DRBD and is the de facto open standard for High Availability (HA) software for enterprise and cloud computing. The LINBIT DRBD software is deployed in millions of mission-critical environments worldwide to provide High Availability (HA), Geo-Clustering for Disaster Recovery (DR), and Software Defined Storage (SDS) for OpenStack and OpenNebula based clouds. Don’t be shy. Visit us at LINBIT.com.

###

LINBIT DR and LINSTOR demo

Demo of Extending LINSTOR Managed DRBD Volume to a DR Node

In this video Matt Kereczman from LINBIT combines components of LINBIT SDS and LINBIT to demonstrate extending an existing LINSTOR managed DRBD volume to a disaster recovery node, located in a geographically-separated datacenter via LINSTOR and DRBD proxy.

Watch the video:

He’s already created a LINSTOR cluster on four nodes: linstor-a, linstor-b, linstor-c and linstor-dr.

You can see that linstor-dr is in a different network than our other three nodes. This network exists in the DR DC, which is connected to our local DC via a 40Mb/s WAN link.

He has a single DRBD resource defined, which is currently replicated synchronously between the three peers in our local datacenter. He’s listed out his LINSTOR-managed resources and volumes which is currently mounted on linstor-a:

Before he adds a replica of this volume to the DR node in his DR datacenter, he’ll quickly test the write throughput of his DRBD device, so he has a baseline of how well it should perform.

He uses the dd to test. Read more

mission-critical-10yr-linbit

LINBIT USA Celebrates 10 Year Anniversary

LINBIT US is celebrating a decade of service and growth. 10 years ago, we started our journey with you from a newly established office in the pacific northwest. In that time, we have moved into new offices, grown our team 4 times in size, built some really great software, and most importantly, met, collaborated with, and served some of the most sophisticated customers along the way. Here’s a snapshot of some of the major milestones told in the present tense.

2010: Our bread and butter has always been High Availability. LINBIT HA software, DRBD, is now in the Linux mainline kernel since 2010, as of release 2.6.33. This promises to be a standout event that makes enterprise-grade HA a standard capability within Linux and puts the open source community on par with the best of proprietary systems out there.

2015: Fast forward to 2015. LINBIT is a company that is actually being talked about as the best solution for huge enterprises! Hundreds of thousands of servers depend on the replication that DRBD provides. All our customers are doing really cool work. And some of them are very well known, such as Cisco and Google. We are forming strong partnerships across North and South America– think RedHat and Suse.

New Horizon: Disaster Recovery

2016: Not only is the LINBIT HA product a success, but our new product focused on disaster recovery, DRBD Proxy, is  proving to be incredibly useful to companies who need to replicate data across distances. LINBIT is having wonderful success in providing clients peace of mind in case a disaster strikes, or perhaps a clumsy admin pulls on some cables they weren’t supposed to be pulling on! Oh, and we can’t forget our fun videos that go along with these products: LINBIT DR, LINBIT HA, and LINBIT SDS.

More in 2016: The official release of DRBD9 to the public. A huge move for enterprises looking to have multiple replicas of their data (up to 32!). Now, companies can implement software-defined storage (SDS) for creating, managing and running a cloud storage environment.

New Kid on the Block: LINSTOR

2018: Now that SDS is a feature, many clients are looking for it. LINBIT is making it even easier, and plausible, with the release of LINSTOR. With this, everything is automated. Deploying a DRBD volume has never been easier.

2018: At this point we would be remiss if we didn’t mention that LINSTOR has Flex Volume & External Provisioner drivers for Kubernetes. We now provide persistent storage to high performance containerized applications! Here is a LINSTOR demo, showing you just how quick and easy it is to deploy a DRBD cluster with 20 resources.

Now: A new guide describes  DRBD for the Microsoft Azure cloud service. We have partners and resellers who have end clients running Windows servers that need HA. One of our engineers even created a video of an NFS failover in Azure!

What else? There is almost too much to say about the past 10 years and the amount of growth and change is astonishing. However, at our core, we are the same. We believe in open source. In building software that turns the difficult into fast, robust, and easy. In our clients. In our company.

“We are grateful”

During a conversation at Red Hat Summit this year, LINBIT COO Brian Hellman was asked how long he had been at LINBIT.  “I replied ‘10 years in September.’ The gentleman was surprised; ‘That’s a long time, especially in the tech industry’.  To which he replied, ‘I love what I do and the people I work with — Not only the members of the LINBIT team, but also our customers, partners, and our extended team.  Without them we wouldn’t be here, they make it all possible and for that we are grateful.”

To whomever is reading this, wherever you are, you were part of it. You ARE part of it! So a big thank you for reading, caring, and hopefully using LINBIT HA, LINBIT DR, or LINBIT SDS. Cheers to another 10 years!


Kelsey Swan
Kelsey turns her personal passion for connecting with people into a supporting LINBIT clients. As the Accounts Manager for LINBIT USA, Kelsey engages with customers to provide them with the best experience possible. From Enterprise companies, to Mom and Pop shops, Kelsey ensures the implementation of LINBIT products goes smoothly. Doing what is best for the client is her #1 priority.
bandwidth-close-up-computer-1148820

Replicating storage volumes on Scaleway ARM with LINSTOR

I’ve been using Scaleway for a while as a platform to spin-up both personal and work machines, mainly because they’re good value and easy to use. Scaleway offers a wide selection of Aarch64 and x86 machines at various price points, however none of these VMs are replicated – not even with RAID at the hardware level – you’re expected to handle that all yourself. Since ARM servers have been making headlines for several years as a competing architecture to x86 in the data center, I thought it would be interesting to set up replication across two ARM Scaleway VMs with DRBD and LINSTOR.

It’s worth pointing out here that if you’re planning on building a production HA environment on Scaleway, you should also reach out to their support team and have them confirm that your replicated volumes aren’t actually sitting on the same spinning disk in case of drive failure, as advised in their FAQ.

Preparing VMs

Linstor scaleway drbd-arm 6

First, we need a couple of VMs with additional storage volumes to replicate. The ARM64-2GB VM doesn’t allow for mounting additional volumes, so let’s go for the next one up, and add an additional 50GB LSSD volume.

Linstor scaleway drbd-arm 2

I’ve gone with an Ubuntu image, if you selected an RPM-based image, substitute package manager commands accordingly. I want to run the following commands on all VMs (in my case I have two, and will be using the first as both my controller and also a satellite node).

$ sudo apt update && sudo apt upgrade

In this case we’ll be deploying DRBD nodes with LINSTOR. We need DRBD9 to do this, but we can’t build a custom kernel module without first getting some prerequisite files for Scaleway’s custom kernel and preparing for a custom kernel module build. Scaleway provides a recommended script to run – we need to save that script and run it before installing DRBD9. I’ve put it in a file on github to make things simple:

$ sudo apt install -y build-essential libssl-dev
$ wget https://raw.githubusercontent.com/dabukalam/scalewaycustommodule/master/scalewaycustommodule
$ chmod +x scalewaycustommodule && sudo ./scalewaycustommodule

Getting LINSTOR

Once that’s done, we can add the LINBIT community repository and install DRBD, LINSTOR, and LVM:

$ sudo add-apt-repository -y ppa:linbit/linbit-drbd9-stack
$ sudo apt update
$ sudo apt install drbd-dkms linstor-satellite linstor-client lvm2

Now I can start the LINSTOR satellite service with:

$ sudo systemctl enable --now linstor-satellite

And make sure the VMs can see each other by adding the other node to each hosts file:

Linstor scaleway drbd-arm 3

Let’s make sure LVM is running and create a volume group for LINSTOR on our additional volume:

$ systemctl enable --now lvm2-lvmetad.service
$ systemctl enable --now lvm2-lvmetad.socket
$ sudo vgcreate sw_ssd /dev/vdb

That’s it for commands you need to run on both nodes. From now on we’ll be running commands on our favorite VM. LINSTOR has four node types – Controller, Auxiliary, Combined, and Satellite. Since I only have two nodes, one will be Combined, and one will be a Satellite. Combined here means that the node is both a Controller and a Satellite.

Adding nodes to the LINSTOR cluster

So on our favorite VM, which we’re going to use as the combined node, we add the local host to the LINSTOR cluster as a combined node, and the other as a satellite:

$ sudo apt install -y linstor-controller
$ sudo systemctl enable --now linstor-controller
$ linstor node create --node-type Combined drbd-arm 10.10.43.13
$ linstor node create --node-type Satellite drbd-arm-2 10.10.25.5
$ linstor node list

It’s worth noting here that you can run commands to manage LINSTOR on any node, just make sure you have the controller node exported as a variable

drbd-arm-2:~$ export LS_CONTROLLERS=drbd-arm

You should now have something that looks like this:

Linstor scaleway drbd-arm 4

Now we have our LINSTOR cluster setup, we can create a storage-pool across the nodes with the same name ‘swpool’, referencing the node name, specifying we want lvm, and the volume group name:

$ linstor storage-pool create drbd-arm swpool lvm sw_ssd
$ linstor storage-pool create drbd-arm-2 swpool lvm sw_ssd

We can then define new resource and volume types, and use them to create the resource. You can perform a whole range of operations at this point including manual node placement and specifying storage pools. Since we only have one storage pool, LINSTOR will automatically select that for us. I only have two nodes so I’ll just autoplace my storage cluster across two.

$ linstor resource-definition create backups
$ linstor volume-definition create backups 40G
$ linstor resource create backups --auto-place 2

LINSTOR will now handle all the resource creation automagically across all our nodes, including dealing with LVM and DRBD. If all succeeds, you should now be able to see your resources. They’ll be inconsistent while DRBD syncs them up. You can also now see the DRBD resources by running drbdmon. Once it’s finished syncing you’ll see a list of your replicated nodes as below (only drbd-arm-2 in my case):

You can now mount the drive on any of the nodes and write to your new replicated storage cluster.

$ linstor resource list-volumes

Linstor scaleway drbd-arm 7

In this case the device name is /dev/drbd1000, so once we create a filesystem on it and mount it I can now write to my new new replicated storage cluster.

$ sudo mkfs /dev/drbd1000
$ sudo mount /dev/drbd1000 /mnt
$ sudo touch /mnt/file

 

 

Danny Abukalam on Linkedin
Danny Abukalam
Danny is a Solutions Architect at LINBIT based in Manchester, UK. He works in conjunction with the sales team to support customers with LINBIT's products and services. Danny has been active in the OpenStack community for a few years, organising events in the UK including the Manchester OpenStack Meetup and OpenStack Days UK. In his free time, Danny likes hunting for extremely hoppy IPAs and skiing, not at the same time.

A Highly Available LINSTOR Controller for Proxmox

For the High Availability setup we describe in this blog post, we assume that you installed LINSTOR and the Proxmox Plugin as described in the Proxmox section of the users guide or our blog post.

The idea is to execute the LINSTOR controller within a VM that is controlled by Proxmox and its HA features, where the storage resides on DRBD, managed by LINSTOR itself.

Preparing the Storage

The first step is to allocate storage for the VM by creating a VM and selecting “Do not use any media” on the “OS” section. The hard disk should reside on DRBD (e.g., “drbdstorage”). Disk space should be at least 2GB, and for RAM we chose 1GB. These are the minimal requirements for the appliance LINBIT provides to its customers (see below). If you set up your own controller VM, or resources are not constrained, increase these minimal values. In the following, we assume that the controller VM was created with ID 100, but it is fine if this VM is created later (after you have already created other VMs).

LINSTOR Controller Appliance

LINBIT provides an appliance for its customers that can be used to populate the created storage. For the appliance to work, we first create a “Serial Port.” First, click on “Hardware” and then on “Add” and finally on “Serial Port.” See image below:

proxmox_serial1_controller_vm

If everything worked as expected, the VM definition should then look like this:

proxmox_add_serial2_controller_vm

The next step is to copy the VM appliance to the created storage. This can be done with qemu-img. Make sure to replace the VM ID with the correct one:

# qemu-img dd -O raw if=/tmp/linbit-linstor-controller-amd64.img \
 of=/dev/drbd/by-res/vm-100-disk-1/0

After that, you can start the VM and connect to it via the Proxmox VNC viewer. The default user name and password are both “linbit”. Note that we kept the defaults for SSH, so you will not be able to log in to the VM via SSH and username/password. If you want to enable that (and/or “root” login), enable these settings in /etc/ssh/sshd_config and restart the ssh service. As this VM is based on “Ubuntu Bionic”, you should change your network settings (e.g., static IP) in /etc/netplan/config.yaml. After that you should be able to ssh to the VM:

proxmox_ssh_controller_vm

Adding the Controller VM to the existing Cluster

In the next step, you add the controller VM to the existing cluster:

# linstor node create --node-type Controller \
 linstor-controller 10.43.7.254

As this special VM will be not be managed by the Proxmox Plugin, make sure all hosts have access to that VM’s storage. In our test cluster, we checked the linstor resource list to confirm where the storage was already deployed and then created further assignments via linstor resource create. In our lab consisting of four nodes, we made all resource assignments diskful, but diskless assignments are fine as well. As a rule of thumb keep the redundancy count at “3” (more usually does not make sense), and assign the rest diskless.

As the storage for this particular VM has to be made available (i.e., drbdadm up), enable the drbd.service on all nodes:

# systemctl enable drbd
# systemctl start drbd

At startup, the `linstor-satellite` service deletes all of its resource files (*.res) and regenerates them. This conflicts with the drbd services that needs these resource files to start the controller VM. It is good enough to first bring up the resources via drbd.service and then start linstor-satellite.service. To make the necessary changes, you need to create a drop-in for the linstor-satellite.service via systemctl (do
not edit the file directly).

# systemctl edit linstor-satellite
[Unit]
After=drbd.service

Switching to the New Controller

Now, it is time for the final steps — namely switching from the existing controller to the new one in the VM. Stop the old controller service on the old host, and copy the LINSTOR controller database to the VM:

# systemctl stop linstor-controller
# systemctl disable linstor-controller
# scp /var/lib/linstor/* [email protected]:/var/lib/linstor/

Finally, we can enable the controller in the VM:

# systemctl start linstor-controller # in the VM
# systemctl enable linstor-controller # in the VM

To check if everything worked as expected, you can query the cluster nodes on a host by asking the controller in the VM: linstor --controllers=10.43.7.254 node list. It is perfectly fine that the controller (which is just a controller and not “combined”) is shown as “OFFLINE”. Still, this might change in the future to something more appropriate.

As the last – but crucial – step, you need to add the “controllervm” option to /etc/pve/storage.cfg, and change the controller IP:

drbd: drbdstorage
  content images,rootdir
  redundancy 3
  controller 10.43.7.254
  controllervm 100

By setting the “controllervm” parameter the plugin will ignore (or act accordingly) if there are actions on the controller VM. Basically, this VM should not be managed by the plugin, so the plugin mainly ignores all actions on the given controller VM ID. However, there is one exception. When you delete the VM in the GUI, it is removed from the GUI. We did not find a way to return/kill it in a way that would keep the VM in the GUI. Yet such requests are ignored by the plugin, so the VM will not be deleted from the LINSTOR cluster. Therefore, it is possible to later create a VM with the ID of the old controller. The plugin will just return “OK”, and the old VM with the old data can be used again. To keep it simple, be careful to not delete the controller VM.

Enabling HA for the Controller VM in Proxmox

Currently, we have the controller executed as VM, but we should make sure that one instance of the VM is started at all times. For that we use Proxmox’s HA feature. Click on the VM; then on “More”; and then on “Manage HA.” We set the following parameters for our controller VM:

promox_manage_ha_controller_vm

Final Considerations

As long as there are surviving nodes in your Proxmox cluster, everything should be fine. In case the node hosting the controller VM is shut down or lost, Proxmox HA will make sure the controller is started on another host. The IP of the controller VM should not change. It is up to you as admin to make sure this is the case (e.g., setting a static IP, or always providing the same IP via dhcp on the bridged interface).

One limitation that is not fully handled with this setup is a total cluster outage (e.g., common power supply failure) with a restart of all cluster nodes. Proxmox is unfortunately pretty limited in that regard. You can enable the “HA Feature” for a VM, and you can define “Start and Shutdown Order” constraints. But both are completely separated from each other. Therefore it is difficult to ensure that the controller VM is up and all other VMs are started.

It might be possible to work around that by delaying VM startup in the Proxmox plugin until the controller VM is up (i.e., if the plugin is asked to start the controller VM it does it, otherwise it waits and pings the controller). While this is a nice idea, it would be a huge failure in a serialized, non-concurrent VM start/plugin call event stream where some VM should be started (which then blocks) before the controller VM is scheduled to be started. That would obviously result in a deadlock.

We will discuss options with Proxmox, but we think the presented solution is valuable in typical use cases as is, especially compared to the complexity of a Pacemaker setup. Use cases where one can expect that not the whole cluster goes down at the same time are (will be??) covered. And even if that is the case, only automatic startup of the VMs would not work when the whole cluster is started. In such a scenario, the admin just has to wait until the Proxmox HA service starts the controller VM. After that, all VMs can be started manually/scripted on the command line.

Roland Kammerer
Software Engineer at Linbit
Roland Kammerer studied technical computer science at the Vienna University of Technology and graduated with distinction. Currently, he is a PhD candidate with a research focus on time-triggered realtime-systems and works for LINBIT in the DRBD development team.

 

art-bridge-linstor-proxmox-plugin

How to setup LINSTOR on Proxmox VE

In this technical blog post, we show you how to integrate DRBD volumes in Proxmox VE via a storage plugin developed by LINBIT. The advantages of using DRBD include a configurable number of data replicas (e.g., 3 copies in a 5 node cluster), access to the data on every node and therefore very fast VM live-migrations (usually takes only a few seconds, depending on memory pressure). Download Linstor Proxmox Plugin

Setup

The rest of this post assumes that you have already set up Proxmox VE (the LINBIT example uses 4 nodes), and have created a PVE cluster consisting of all nodes. While this post is not meant to  replace the DRBD User’s Guide, we try to show a complete setup.

The setup consists of two important components:

  1. LINSTOR manages DRBD resource allocation
  2. linstor-proxmox plugin that implements the Proxmox VE storage plugin API and executes LINSTOR commands.

In order for the plugin to work, you must first create a LINSTOR cluster.

LINSTOR Cluster

We have assumed here that you have already set up the LINBIT Proxmox repository as described in the User’s guide. If you have not completed this set up, execute the following commands on all cluster nodes. First, we need the low-level infrastructure (i.e., the DRBD9 kernel module and drbd-utils):

apt install pve-headers
apt install drbd-dkms drbd-utils
rmmod drbd; modprobe drbd
grep -q drbd /etc/modules || echo "drbd" >> /etc/module

The next step is to install LINSTOR:

apt install linstor-controller linstor-satellite linstor-client
systemctl start linstor-satellite
systemctl enable linstor-satellite

Now, decide which of your hosts should be the current controller node and enable the linstor-controller service on that particular node only:

systemctl start linstor-controller

Volume creation

Obviously, DRBD needs storage to create volumes. In this post we assume a setup where all nodes contain an LVM-thinpool called drbdpool. In our sample setup, we created it on the pve volume group, but in your setup, you might have a different storage topology. On the node that runs the controller service, execute the following commands to add your nodes:

linstor node create alpha 10.0.0.1 --node-type Combined
linstor node create bravo 10.0.0.2 --node-type Combined
linstor node create charlie 10.0.0.3 --node-type Combined
linstor node create delta 10.0.0.4 --node-type Combined

“Combined” means that this node is allowed to execute a LINSTOR controller and/or a satellite, but a node does not have to execute both. So it is safe to specify “Combined”; it does not influence the performance or the number of services started.

The next step is to configure a storage pool definition. As described in the User’s guide, most LINSTOR objects consist of a “definition” and then concrete instances of such a definition:

linstor storage-pool-definition create drbdpool

By now it is time to mention that the LINSTOR client provides handy shortcuts for its sub-commands. The previous command could have been written as linstor spd c drbdpool. The next step is to register every node’s storage pool:

for n in alpha bravo charlie delta; do \
linstor storage-pool create $n drbdpool lvmthin pve/drbdpool; \
done

DRBD resource creation

After that we are ready to create our first real DRBD resource:

linstor resource-definition create first
linstor volume-definition create first 10M --storage-pool drbdpool
linstor resource create alpha first
linstor resource create bravo first

Now, check with drbdadm status that  “alpha” and “bravo” contain a replicated DRBD resource called “first”. After that this dummy resource can be deleted on all nodes by deleting its resource definition:

linstor resource-definition delete -q first

LINSTOR Proxmox VE Plugin Setup

As DRBD and LINSTOR are already set up, the only things missing is installing the plugin itself and its configuration.

apt install linstor-proxmox

The plugin is configured via the file /etc/pve/storage.cfg:

drbd: drbdstorage
content images, rootdir
redundancy 2 controller 10.0.0.1

It is not necessary to copy that file to the other nodes, as /etc/pve is already a replicated file system. After the configuration is done, you should restart the following service:

systemctl restart pvedaemon

After this setup is done, you are able to create virtual machines backed by DRBD from the GUI. To do so, select “drbdstorage” as storage in the “Hard Disk” section of the VM. LINSTOR selects the nodes that have the most free storage to create the replicated backing devices.

Distribution

The interested reader can check which ones were selected via LINSTOR resource list. While interesting, it is important to know that the storage can be accessed by all nodes in the cluster via a DRBD feature called “diskless clients”. So let’s assume “alpha” and “bravo” had the most free space and were selected, and the VM was created on node “bravo”. Via the low level tool drbdadm status we now see that the resource is created on two nodes (i.e., “alpha” and “bravo”) and the DRBD resource is in “Primary” role on “bravo”.

Now we want to migrate the VM from “bravo” to node “charlie”. This is again done via a few clicks in the GUI, but the interesting steps happen behind the scene: The storage plugin realizes that it has access to the data on “alpha” and “bravo” (our two replicas) but also needs access on “charlie” to execute the VM. The plugin therefore creates a diskless assignment on “charlie”. When you execute drbdadm status on “charlie”, you see that now three nodes are involved in the overall picture:

• Alpha with storage in Secondary role
• Bravo with storage in Secondary role
• Charlie as a diskless client in Primary role

Diskless clients are created (and deleted) on demand without further user interaction, besides moving around VMs in the GUI. This means that if you now move the VM back to “bravo”, the diskless assignment on “charlie” gets deleted as it is no longer needed.

If you would have moved the VM from “charlie” to “delta”, the diskless assignment for “charlie” would have been deleted, and a new one for “delta” would have been created.

For you it is probably even more interesting that all of this including VM migration happens within seconds without moving the actual replicated storage contents.

Next Steps

So far, we created a replicated and highly-available setup for our VMs, but the LINSTOR controller and especially its database are not highly-available. In a future blog post, we will describe how to make the controller itself highly-available by only using software already included in Proxmox VE (i.e., without introducing complex technologies like Pacemaker). This will be achieved with a dedicated controller VM that will be provided by LINBIT as an appliance.

Roland Kammerer
Software Engineer at Linbit
Roland Kammerer studied technical computer science at the Vienna University of Technology and graduated with distinction. Currently, he is a PhD candidate with a research focus on time-triggered realtime-systems and works for LINBIT in the DRBD development team.
Split brain

Split Brain? Never Again! A New Solution for an Old Problem: DRBD Quorum

While attending OpenStack Summit in Atlanta, I sat in a talk about the difficulties of implementing High Availability (HA) clusters. At one point, the speaker presented a picture of a split-brain, discussed the challenges in resolving them, and implementing STONITH in certain environments. As many of you know, “split-brain” is a condition that can happen when each node in a cluster thinks that it is the only active node. The system as a whole loses grip on its “state”; nodes can go rogue, and data sets can diverge without making it clear which one is primary. Data loss or data corruption can result, but there are ways to make sure this doesn’t happen, so I was interested in probing further.

Fencing is not always the solution

Split brain

The Split brain problem can be solved by DRBD Quorum.

To make it more interesting, it turned out that the speaker’s company uses DRBD and Pacemaker for HA, a setup that is very familiar to us. After the talk, I approached the speaker and recommended that they consider “fencing” as a way to avoid split-brain. Fencing regulates access to a shared resource and can be a good safeguard. As the resource needs separate communication path best practices suggest not using the same one that it is trying to regulate, so it needs a separate communication path. Unfortunately, in his environment, redundant networking was not possible. We needed another method.

Split brain is solved via DRBD Quorum

After talking to the speaker, it was clear to me that a new option for avoiding split brain or diverging data sets was needed since existing solutions may not always be feasible in certain infrastructures. This got me thinking about the various options for avoiding split-brain and how fencing could be implemented by using the built-in communication found in DRBD 9. It turns out that the capability of mirroring more than two nodes, found in DRBD 9 is a viable solution.

That idea sparked the work on the newest feature in DRBD: Quorum.

Shortly thereafter, the LINBIT team developed and integrated a working solution into DRBD. The code was pushed to the LINBIT repository and ready for testing.

Interest was almost immediate!

Later on, I happened to meet a few folks from IBM UK. They were working on IBM MQ Advanced Software, the well-known messaging middleware software that helps integrate applications and data across multiple platforms. They intended to use DRBD for their replication needs and quickly became interested in the idea of using a Quorum mechanism to mitigate split-brain situations.

DRBD Quorum takes new perspective

IBM LogoThe DRBD Quorum feature takes a new approach to avoiding data divergence.  A cluster partition may only modify the replicated data set if the number of nodes that can communicate is greater than half of the overall number of nodes within the defined cluster. By only allowing writes on a node that has access to over half the nodes in a given partition, we avoid creating a diverging data set.

The initial implementation of this feature would cause any node that lost Quorum (and was running the application/data set) to be rebooted.  Removing access to the data set is required to ensure the node stops modifying data. After extensive testing, the IBM team suggested a new idea that instead of rebooting the node, terminate the application. This action would then trigger the already available recovery process, forcing services to migrate to a node with Quorum!

Attractive alternative to fencing

As usual, the devil is in the details. Getting the implementation right with the appropriate resync decisions was not as straightforward as one might think. In addition to our own internal testing, many IBM engineers also tested it as well. We are happy to report that current implementation does exactly what was expected!

Bottom line:

If you need to mirror your data set three times, the new DRBD Quorum feature is an attractive alternative to hardware fencing.

In case you want to learn more about the Quorum implementation in DRBD
please see the DRBD9 user’s guide:
https://docs.linbit.com/docs/users-guide-9.0/#s-feature-quorum
https://docs.linbit.com/docs/users-guide-9.0/#s-configuring-quorum

Image  (Lloyd Fugde – stock.adobe.com)

Philipp Reisner on Linkedin
Philipp Reisner
Philipp Reisner is founder and CEO of LINBIT in Vienna/Austria. His professional career has been dominated by developing DRBD, a storage replication for Linux. Today he leads a company of about 30 employees with locations in Vienna/Austria and Portland/Oregon.

 

 

LINBIT’s DRBD ships with integration to VCS

The LINBIT DRBD software has been updated with an integration for Veritas Infoscale Availability (VIA). VIA, formerly known as Veritas Cluster Server (VCS), is a proprietary cluster manager for building highly available clusters on Linux. Examples of application cluster capabilities are Network File Sharing databases or e-commerce websites. VCS solves the same problem as the Pacemaker Open Source projects.  

Yet, in contrast to Pacemaker, VCS has a long history on the Unix Platform. VCS came to Linux as Linux began to surpass legacy Unix platforms. In addition to its longevity, VCS has a strong and clean user experience. For example, VCS is ahead of the Pacemaker software when it comes to clarity of log files. Notably, the Veritas Cluster Server has slightly fewer features than Pacemaker. (With great power comes complexity!)

Gear-drbd-integration-VCS

The gear runs even smoother. DRBD has an integration for VCS.

VCS integration for DRBD

Since January 2018, DRBD has been shipping with an integration to VCS. Users are now able to use VCS instead of Pacemaker and even control DRBD via VCS. It consists of two agents: DRBDConfigure and DRBDPrimary that enable drbd-8.4 and drbd-9.0 for VCS.

Full documentation can be found here on our website:

https://docs.linbit.com/docs/users-guide-9.0/#s-feature-VCS

and

https://github.com/LINBIT/drbd-utils/tree/master/scripts/VCS

Besides VCS Linbit DRBD supports variety of Linux software so you can keep your system up and running.

Besides VCS Linbit DRBD supports variety of Linux software so you can keep your system up and running.

Pacemaker 1.0.11 and up
Heartbeat 3.0.5 and up
Corosync 2.x and up

 

Reach out to [email protected] for more information.

We are driven by the passion of keeping the digital world running. That’s why hundreds of customers trust in our expertise, services and products. Our OpenSource product DRBD has been installed several million times. Linbit established DRBD® as the industry standard for High-Availability (HA) and data redundancy for mission critical systems. DRBD enables disaster recovery and HA for any application on Linux, including iSCSI, NFS, MySQL, Postgres, Oracle, Virtualization and more.

Philipp Reisner on Linkedin
Philipp Reisner
Philipp Reisner is founder and CEO of LINBIT in Vienna/Austria. His professional career has been dominated by developing DRBD, a storage replication for Linux. Today he leads a company of about 30 employees with locations in Vienna/Austria and Portland/Oregon.

 

Dreaded Day of Downtime

Some say that no one dreads a day of downtime like a storage admin.

I disagree. Sure, the storage admins might be responsible for recovering a whole organization if an outage occurs; and sure, they might be the ones who lose their jobs from an unexpected debacle, but I would speculate that others have more to lose.

First, the company’s reputation takes a big, possibly irreparable hit with both clients and  employees. Damage control usually lasts far longer than the original outage.  Take the United Airlines case from earlier in 2017 when a computer malfunction led to the grounding of all domestic flights. Airports across the country were forced to tweet out messages about the technical issues after receiving an overwhelming number of complaints. Outages such as this one can take months or years to repair the trust with your customers. Depending upon the criticality of the services, a company could go bankrupt. Despite all this, even the company isn’t the biggest loser; it is the end-user: and that is what the rest of this post will focus on.

Let’s say you’re a senior in college. It’s spring term, and graduation is just one week away.  Your school has an online system to submit assignments which are due at midnight, the day before finals week. Like most students at the school, you log into the online assignment submission module, just like you have always done.  Except this time, you get a spinning wheel. Nothing will load. It must be your internet connection. You call a friend to have them submit your papers, but she can’t login either. The culprit: the system is down.

Now, it’s 10:00 PM and you need to submit your math assignment before midnight. At 11:00 PM you start to panic. You can’t log-in and neither can your classmates.  Everyone is scrambling. You send a hastily written email to your professor explaining the issue. She is unforgiving because you shouldn’t have procrastinated in the first place. At 1:00 AM, you refresh the system and everything is working (slowly), but the deadlines have passed. The system won’t let you submit anything. Your heart sinks as you realize that without that project, you will fail your math class and not be able to graduate.

This system outage caused heartache, stress and uncertainty for the students and teachers along with a whole lot of pain for the administrators.  The kicker is that the downtime happened when traffic was anticipated to be the highest! Of course, the servers are going to be overloaded during the last week of Spring term. Yet, notoriously, the University will send an email stating that it experienced higher than expected loads; and that ultimately, they weren’t prepared for it.

During this time, traffic was 15 times its normal usage, and the Hypervisor hosting the NFS server and the file sharing system was flooded with requests.  It blew a fan and eventually overheated. Sure, the data was still safe inside the SAN on the backend.  However, none of that mattered when the students couldn’t access the data until the admin rebuilt the Hypervisor. By the time the server was back up and running, the damage was done.

High Availability isn’t a simple concept but it is critical for your organization, your credibility, and even more importantly, for your end-users or customers. In today’s world, the bar for “uptime” is monstrously high therefore downtime is simply unacceptable.

If you’re a student, an admin or a simple system user- I have a question for you (and don’t just think about yourself, think about your boss, colleagues, and clients):

What would your day look like if your services went unresponsive right… NOW?!
Learn more about the costs and drivers of data loss, and how to avoid it, by reading the paper from OrionX Research.

 

Greg Eckert on Linkedin
Greg Eckert
In his role as the Director of Business Development for LINBIT America and Australia, Greg is responsible for building international relations, both in terms of technology and business collaboration. Since 2013, Greg has connected potential technology partners, collaborated with businesses in new territories, and explored opportunities for new joint ventures.