Posts

man-person-jumping-desert

LINSTOR grows beyond DRBD

For quite some time, LINSTOR has been able to use NVMe-oF storage targets via the Swordfish API. This was expressed in LINSTOR as a resource definition that contains a single resource with one backing disk (that is the NVMe-oF target) and one diskless resource (that is the NVMe-oF initiator).

Layers in the storage stack

In the last few months the team has been busy making LINSTOR more generic, adding support for resource templates. A resource template describes a storage stack in terms of layers for specific resources/volumes. Here are some examples of such storage stacks:

    • DRBD on top of logic volumes (LVM)
    • DRBD on top of zVols (ZFS)
    • Swordfish initiator & target on top of logic volumes (LVM)
    • DRBD on top of LUKS on top of logic volumes (LVM)
  • LVM only

The team came up with an elegant approach that introduces these additional resource templates in ways that allow existing LINSTOR configurations to keep their semantics as the default resource templates.

With this decoupling, we no longer need to have DRBD installed on LINSTOR clusters that do not require the replication functions of DRBD.

What does that mean for DRBD?

The interests of LINBIT’s customers vary widely. Some want to use LINSTOR without DRBD – which is now supported. A very prominent example of this is Intel, who uses LINSTOR in its Rack Scale Design effort to connect storage nodes and compute nodes with NVMe-oF. In this example, the storage is disaggregated from the other nodes.

Other customers see converged architectures as a better fit. For converged scenarios, DRBD has many advantages over a pure data access protocol such as NVMe-oF. LINSTOR is built from the ground up to manage DRBD, therefore, the need for DRBD support will remain.

Linux-native NVMe-oF and NVMe/TCP

SNIA’s Swordfish has clear benefits with creating a standard for managing storage targets such as allowing optimized storage target implementations, as well as a hardware-accelerated data-path, non-Linux control path.

Due to the fact that Swordfish is an extension of Redfish, which needs to be implemented in the Baseboard Management Controller (BMC), we have decided to extend LINSTOR’s driver set to configure NVMe-oF target and initiator software. We do this by utilizing existing tools found within the Linux operating system, eliminating the need for a Swordfish software stack.

Summary

LINSTOR now supports configurations without DRBD. It is now a unified storage orchestrator for replicated and non-replicated storage.

Philipp Reisner on Linkedin
Philipp Reisner
Philipp Reisner is founder and CEO of LINBIT in Vienna/Austria. His professional career has been dominated by developing DRBD, a storage replication for Linux. Today he leads a company of about 30 employees with locations in Vienna/Austria and Portland/Oregon.
LINSTOR High Level Resource API

High Level Resource API – The simplicity of creating replicated volumes

In this blog post, we present one of our recent extensions to the LINSTOR ecosystem: A high-level, user-friendly Python API that allows simple DRBD resource management via LINSTOR.

Background: So far LINSTOR components communicated by the following means: Via Protocol Buffers, or via the Python API that is used in the linstor command line client. Protocol Buffers are a great way to transport serialized structured data between LINSTOR components, but by themselves they don’t provide the necessary abstraction for developers.

That is not the job of Protocol Buffers. Since the early days we split the command line client into the client logic (parsing configuration files, parsing command line arguments…), and a Python library (python-linstor). This Python library provides all the bits and pieces to interact with LINSTOR. For example it provides a MultiLinstor class that handles TCP/IP communication to the LINSTOR controller. Additionally, it allows all the operations that are possible with LINSTOR (e.g. creating nodes, creating storage pools…). For perfectly valid reasons this API is very low level and pretty close to the actual Protocol Buffer messages sent to the LINSTOR controller.

By developing more and more plugins to integrate LINSTOR into other projects like OpenStack, OpenNebula, Docker Volumes, and many more, we saw that there is need for a higher level abstraction.

Finding the Right Abstraction

The first dimension of abstraction is to abstract from LINSTOR internals. For example it perfectly makes sense that recreating an existing resource is an error on a low level (think of it as EEXIST). On a higher level, depending on the actual object, trying to recreate an object might be perfectly fine and one wants to get the existing object (i.e. idem-potency).

The second dimension of abstraction is from DRBD and LINSTOR as a whole. Developers dealing with storage already have a good knowledge about concepts like nodes, storage pools, resource, volumes, placement policies… This is the part where we can make LINSTOR and DRBD accessible for new developers.

The third goal was to only provide a set of objects that are important in the context of the user/developer. This, for example, means that we can assume that the LINSTOR cluster is already set up, so we do not need to provide a high-level API to add nodes. For the higher-level API we can focus on [LINSTOR] resources. This allows us to satisfy the KISS (keep-it-simple-stupid) principle. A forth goal was to introduce new, higher-level concepts like placement policies. Placement policies/templates are concepts currently developed in core LINSTOR, but we can already provide basics on a higher level.

Demo Time

We start by creating a 10 GB big replicated LINSTOR/DRBD volume in a 3 node cluster. We want the volume to be 2 times redundant. Then we increase the size of the volume to 20 GB.

$ python
>> import linstor
>> foo = linstor.Resource('foo')
>> foo.volumes[0] = linstor.Volume("10 GB")

There are multiple ways to specify the size.

>> foo.placement.redundancy = 2
>> foo.autoplace()
>> foo.volumes[0].size += 10 * (2 ** 30)

This line is enough to resize a replicated volume cluster wide.

We needed 5 lines of code to create a replicated DRBD volume in a cluster! Let that sink in for a moment and compare it to the steps that were necessary without LINSTOR: Creating backing devices on all nodes, writing and synchronizing DRBD res(ource) files, creating meta-data on all nodes, drbdadm up the resource and force one to the Primary role to start the initial sync.

For the next step we assume that the volume is replicated and that we are a storage plugin developer. Our goal is to make sure the volume is accessible on every node because the block device should be used in a VM. So, A) make sure we can access the block device, and B) find out what the name of the block device of the first volume actually is:

>>> foo.activate(socket.gethostname())
>>> print(foo.volumes[0].device_path)

The method activate is one of these methods that shows how we intended abstraction. Note that we autoplaced the resource 2 times in a 3-node cluster. So LINSTOR chose the nodes that fit best. But now we want the resource to be accessible on every node without increasing the redundancy to 3 (because that would need additional storage and 2 times replicated data is good enough).

Diskless clients

Fortunately DRBD has us covered as it has the concept of diskless clients. These nodes provide a local block device as usual, but they read and write data from/to their peers only over the network (i.e. no local storage). Creating this diskless assignment is not necessary if the node was already part of the replication in the first place (then it already has access to the data locally).

This is exactly what activate does: If the node can already access the data – fine, if not, create a diskless assignment. Now assume we are done and we do not need access to the device anymore. We want to do some cleanup because we do not need a diskless assignment:

>>> foo.deactivate(socket.gethostname()) 

The semantic of this method is to remove the assignment if it is diskless (as it does not contribute to actual redundancy), but if it is a node that stores actual data, deactivate does nothing and keeps the data as redundant as it was. This is only a very small subset of the functionality the high-level API provides, there is a lot more to know like creating snapshots, converting diskless assignments to diskful ones and vice versa, or managing DRBD Proxy. For more information check the online documentation.

If you want to go deeper into the LINSTOR universe, please visit our youtube channel.

Roland Kammerer
Software Engineer at Linbit
Roland Kammerer studied technical computer science at the Vienna University of Technology and graduated with distinction. Currently, he is a PhD candidate with a research focus on time-triggered realtime-systems and works for LINBIT in the DRBD development team.
windows drbd kernel driver

WinDRBD: DRBD for Windows is Coming

Plugin for Linstor with OpenNebula

How to Setup LINSTOR with OpenNebula

This post will guide you through the setup of the LINSTOR – OpenNebula Addon. After completing it, you will be able to easily live-migrate virtual machines between OpenNebula nodes, and additionally, have data redundancy.

Setup Linstor with OpenNebula

This post assumes that you already have OpenNebula installed and running on all of your nodes. At first I will give you a quick guide for installing LINSTOR, for a more detailed documentation please read the DRBD User’s Guide. The second part will show you how to add a LINSTOR image and system datastore to OpenNebula.

We will assume the following node setup:

Node name IP Role
alpha 10.0.0.1 Controller/ON front-end
bravo 10.0.0.2 Virtualization host
charlie 10.0.0.3 Virtualization host
delta 10.0.0.4 Virtualization host

Make sure you have configured 2 lvm-thin storage pools named linstorpool/thin_image and linstorpool/thin_system on all of your nodes.

LINSTOR setup

Install LINSTOR packages

The easiest setup is to install the linstor-controller on the same node as the OpenNebula cloud front-end. The linstor-opennebula package contains our OpenNebula driver, and therefore, is essential on the OpenNebula cloud front-end node. On this node install the following packages:

apt install drbd-dkms drbd-utils python-linstor linstor-satellite linstor-client linstor-controller linstor-opennebula

After the installation completes start the linstor-controller and enable the service:

systemctl start linstor-controller
systemctl enable linstor-controller

On all other virtualization nodes you do not need the linstor-controllerlinstor-client or linstor-opennebula package:

apt install drbd-dkms drbd-utils python-linstor linstor-satellite

For all nodes (including the controller) you have to start and enable the linstor-satellite:

systemctl start linstor-satellite
systemctl enable linstor-satellite

Now all LINSTOR-related services should be running.

Adding and configuring LINSTOR nodes

All nodes that should work as virtualization nodes need to be added to LINSTOR, so that storage can be distributed and activated on all nodes:

linstor node create alpha 10.0.0.1 --node-type Combined
linstor node create bravo 10.0.0.2
linstor node create charlie 10.0.0.3
linstor node create delta 10.0.0.4

Now we will configure the system and image lvm-thin pools with LINSTOR:

linstor storage-pool create lvmthin alpha open_system linstorpool/thin_system
linstor storage-pool create lvmthin bravo open_system linstorpool/thin_system
linstor storage-pool create lvmthin charlie open_system linstorpool/thin_system
linstor storage-pool create lvmthin delta open_system linstorpool/thin_system

linstor storage-pool create lvmthin alpha open_image linstorpool/thin_image
linstor storage-pool create lvmthin bravo open_image linstorpool/thin_image
linstor storage-pool create lvmthin charlie open_image linstorpool/thin_image
linstor storage-pool create lvmthin delta open_image linstorpool/thin_image

For testing we can now try to create a dummy test resource:

linstor resource-definition create dummy
linstor volume-definition create dummy 10M
linstor resource create dummy --auto-place 3 -s open_image

If everything went fine with the above commands you should be able to see a resource created on 3 nodes using our default lvm-thin storage pool:

linstor resource list-volumes

Now we can delete the created dummy resource:

linstor resource-definition delete dummy

LINSTOR is now setup and ready to be used by OpenNebula.

OpenNebula LINSTOR datastores

OpenNebula uses different types of datastores: system, image and files.

LINSTOR supports the system and image datastore types.

  • System datastore is used to store a small context image that stores all information needed to run a virtual machine (VM) on a node.
  • Image datastore as it name reveals stores VM images.

OpenNebula doesn’t need to be configured with a LINSTOR system datastore; it will also work with its default system datastore, but using LINSTOR system datastore gives it some data redundancy advantages.

Setup LINSTOR datastore drivers

As LINSTOR is an addon driver for OpenNebula, the LINSTOR OpenNebula driver needs to be added to it, to do so you need to modify the /etc/one/oned.conf and add linstor to the TM_MAD and DATASTORE_MAD sections.

TM_MAD = [
  executable = "one_tm",
  arguments = "-t 15 -d dummy,lvm,shared,fs_lvm,qcow2,ssh,vmfs,ceph,linstor"
]

Note that for the DATASTORE_MAD section the linstor driver has to specified 2 times (image datastore and system datastore).

DATASTORE_MAD = [
    EXECUTABLE = "one_datastore",
    ARGUMENTS  = "-t 15 -d dummy,fs,lvm,ceph,dev,iscsi_libvirt,vcenter,linstor -s shared,ssh,ceph,fs_lvm,qcow2,vcenter,linstor"
]

And finally at the end of the configuration file, add new TM_MAD_CONF and DS_MAD_CONF sections for the linstor driver:

TM_MAD_CONF = [
    name = "linstor", ln_target = "NONE", clone_target = "SELF", shared = "yes", ALLOW_ORPHANS="yes"
]

DS_MAD_CONF = [
    NAME = "linstor", REQUIRED_ATTRS = "BRIDGE_LIST", PERSISTENT_ONLY = "NO", MARKETPLACE_ACTIONS = "export"
]

Now restart the OpenNebula service.

Adding LINSTOR datastore drivers

After we registered the LINSTOR driver with OpenNebula we can add the image and system datastore.

For the system datastore we will create a configuration file and add it with the onedatastore tool. If you want to use more than 2 replicas, just edit the LINSTOR_AUTO_PLACE value.

cat >system_ds.conf <<EOI
NAME = linstor_system_auto_place
TM_MAD = linstor
TYPE = SYSTEM_DS
LINSTOR_AUTO_PLACE = 2
LINSTOR_STORAGE_POOL = "open_system"
BRIDGE_LIST = "alpha bravo charlie delta"
EOI

onedatastore create system_ds.conf

And we do nearly the same for the image datastore:

cat >image_ds.conf <<EOI
NAME = linstor_image_auto_place
DS_MAD = linstor
TM_MAD = linstor
TYPE = IMAGE_DS
DISK_TYPE = BLOCK
LINSTOR_AUTO_PLACE = 2
LINSTOR_STORAGE_POOL = "open_image"
BRIDGE_LIST = "alpha bravo charlie delta"
EOI

onedatastore create image_ds.conf

Now you should see 2 new datastores in the OpenNebula web front-end that are ready to use.

Usage and Notes

The new datastores can be used in the usual OpenNebula datastore selections and should support all OpenNebula features.

The LINSTOR datastores have also some configuration options that are described on the drivers github repository page.

Data distribution

The interested reader can check which ones were selected via LINSTOR resource list.

linstor resource list

While interesting, it is important to know that the storage can be accessed by all nodes in the cluster via a DRBD feature called “diskless clients”. So let’s assume “alpha” and “bravo” had the most free space and were selected, and the VM was created on node “bravo”. Via the low level tool drbdadm status we now see that the resource is created on two nodes (i.e., “alpha” and “bravo”) and the DRBD resource is in “Primary” role on “bravo”.

Now we want to migrate the VM from “bravo” to node “charlie”. This is again done via a few clicks in the GUI, but the interesting steps happen behind the scenes: The storage plugin realizes that it has access to the data on “alpha” and “bravo” (our two replicas) but also needs access on “charlie” to execute the VM. The plugin therefore creates a diskless assignment on “charlie”. When you execute drbdadm status on “charlie”, you see that now three nodes are involved in the overall picture:

  • Alpha with storage in Secondary role
  • Bravo with storage in Secondary role
  • Charlie as a diskless client in Primary role

Diskless clients are created (and deleted) on demand without further user interaction, besides moving around VMs in the GUI. This means that if you now move the VM back to “bravo”, the diskless assignment on “charlie” gets deleted as it is no longer needed.

If you would have moved the VM from “charlie” to “delta”, the diskless assignment for “charlie” would have been deleted, and a new one for “delta” would have been created.

For you it is probably even more interesting that all of this including VM migration happens within seconds without moving the actual replicated storage contents.

Check this for LINSTOR and OpenNebula:

 

 

Rene Peinthor
Software developer
Rene was one of the first developers seeing a DRBD resource deployed by LINSTOR and is software developer at LINBIT since 2017.
While not squashing bugs in LINSTOR, Rene is either climbing or paragliding down a mountain.

LINBIT SDS Adds Disaster Recovery and Support for Kubernetes

LINBIT SDS (Linux SDS) will showcase cloud-native enterprise storage management at KubeCon + CloudNativeCon in Seattle

Beaverton, OR, Dec. 3, 2018 – LINBIT enhances open source software-defined storage (SDS) by providing disaster recovery (DR) replication for critical data. LINBIT SDS is an enterprise-class storage management solution designed for cloud and container storage workloads.

To simplify administration, enhance user experience, and accelerate integration with other software, LINBIT SDS relies on the pre-existing storage management capabilities native to Linux, such as LVM and DRBD. These capabilities are complemented by LINSTOR, a feature-rich volume management software. One supported storage tool is DRBD, the in-kernel block level data-replication for Linux. By announcing support for DRBD Proxy, LINBIT extends replication to disaster recovery scenarios since DRBD Proxy enables fast and reliable data replication over any distance by resolving network communications and handling data access latencies.

“LINBIT SDS is rapidly becoming the reliable, high performance, and economical choice for enterprise and cloud workloads,” said Brian Hellman COO of LINBIT. “With simplified support for DR, LINBIT SDS is surpassing the costly and complex proprietary cloud storage solutions.”

LINBIT SDS provides a host of capabilities to manage persistent block storage for Kubernetes environments. It supports logical volume management (LVM) snapshots, which enhance application availability while minimizing data loss; thin provisioning, which improves efficient resource utilization in virtualized environments; and volume management, which simplifies tasks such as adding, removing, or replicating storage volumes.

LINBIT SDS supports Kubernetes

The Linux based SDS solution works with the leading cloud projects Kubernetes, OpenStack, and OpenNebula, as well as a range of virtualization platforms, and as a stand-alone product. Learn more about how the software works by watching a short video-demo here:

Persistent Kubernetes Storage for Databases (MySQL) with LINSTOR and DRBD (Demo)

OpenStack Cinder: Open-Source Volume Management with LINSTOR and DRBD

Linux Disaster Recovery Replication with DRBD Proxy (Demo)

Kubernetes + LINSTOR Container Failover (Demo)

LINBIT is a member of the Linux Foundation and is proud to support KubeCon, a Linux Foundation conference. Visit us at KubeCon + CloudNativeCon at booth #S7, December 11th-13th, 2018 in Seattle.

ABOUT LINBIT

LINBIT is the force behind DRBD and is the de facto open standard for High Availability (HA) software for enterprise and cloud computing. The LINBIT DRBD software is deployed in millions of mission-critical environments worldwide to provide High Availability (HA), Geo-Clustering for Disaster Recovery (DR), and Software Defined Storage (SDS) for OpenStack and OpenNebula based clouds. Don’t be shy. Visit us at LINBIT.com.

###

LINBIT DR and LINSTOR demo

Demo of Extending LINSTOR-Managed DRBD Volume to a DR Node

In this video Matt Kereczman from LINBIT combines components of LINBIT SDS and LINBIT to demonstrate extending an existing LINSTOR managed DRBD volume to a disaster recovery node, located in a geographically-separated datacenter via LINSTOR and DRBD proxy.

Watch the video:

He’s already created a LINSTOR cluster on four nodes: linstor-a, linstor-b, linstor-c and linstor-dr.

You can see that linstor-dr is in a different network than our other three nodes. This network exists in the DR DC, which is connected to our local DC via a 40Mb/s WAN link.

He has a single DRBD resource defined, which is currently replicated synchronously between the three peers in our local datacenter. He’s listed out his LINSTOR-managed resources and volumes which is currently mounted on linstor-a:

Before he adds a replica of this volume to the DR node in his DR datacenter, he’ll quickly test the write throughput of his DRBD device, so he has a baseline of how well it should perform.

He uses the dd to test. Read more

mission-critical-10yr-linbit

LINBIT USA Celebrates 10 Year Anniversary

LINBIT US is celebrating a decade of service and growth. 10 years ago, we started our journey with you from a newly established office in the pacific northwest. In that time, we have moved into new offices, grown our team 4 times in size, built some really great software, and most importantly, met, collaborated with, and served some of the most sophisticated customers along the way. Here’s a snapshot of some of the major milestones told in the present tense.

2010: Our bread and butter has always been High Availability. LINBIT HA software, DRBD, is now in the Linux mainline kernel since 2010, as of release 2.6.33. This promises to be a standout event that makes enterprise-grade HA a standard capability within Linux and puts the open source community on par with the best of proprietary systems out there.

2015: Fast forward to 2015. LINBIT is a company that is actually being talked about as the best solution for huge enterprises! Hundreds of thousands of servers depend on the replication that DRBD provides. All our customers are doing really cool work. And some of them are very well known, such as Cisco and Google. We are forming strong partnerships across North and South America– think RedHat and Suse.

New Horizon: Disaster Recovery

2016: Not only is the LINBIT HA product a success, but our new product focused on disaster recovery, DRBD Proxy, is  proving to be incredibly useful to companies who need to replicate data across distances. LINBIT is having wonderful success in providing clients peace of mind in case a disaster strikes, or perhaps a clumsy admin pulls on some cables they weren’t supposed to be pulling on! Oh, and we can’t forget our fun videos that go along with these products: LINBIT DR, LINBIT HA, and LINBIT SDS.

More in 2016: The official release of DRBD9 to the public. A huge move for enterprises looking to have multiple replicas of their data (up to 32!). Now, companies can implement software-defined storage (SDS) for creating, managing and running a cloud storage environment.

New Kid on the Block: LINSTOR

2018: Now that SDS is a feature, many clients are looking for it. LINBIT is making it even easier, and plausible, with the release of LINSTOR. With this, everything is automated. Deploying a DRBD volume has never been easier.

2018: At this point we would be remiss if we didn’t mention that LINSTOR has Flex Volume & External Provisioner drivers for Kubernetes. We now provide persistent storage to high performance containerized applications! Here is a LINSTOR demo, showing you just how quick and easy it is to deploy a DRBD cluster with 20 resources.

Now: A new guide describes  DRBD for the Microsoft Azure cloud service. We have partners and resellers who have end clients running Windows servers that need HA. One of our engineers even created a video of an NFS failover in Azure!

What else? There is almost too much to say about the past 10 years and the amount of growth and change is astonishing. However, at our core, we are the same. We believe in open source. In building software that turns the difficult into fast, robust, and easy. In our clients. In our company.

“We are grateful”

During a conversation at Red Hat Summit this year, LINBIT COO Brian Hellman was asked how long he had been at LINBIT.  “I replied ‘10 years in September.’ The gentleman was surprised; ‘That’s a long time, especially in the tech industry’.  To which he replied, ‘I love what I do and the people I work with — Not only the members of the LINBIT team, but also our customers, partners, and our extended team.  Without them we wouldn’t be here, they make it all possible and for that we are grateful.”

To whomever is reading this, wherever you are, you were part of it. You ARE part of it! So a big thank you for reading, caring, and hopefully using LINBIT HA, LINBIT DR, or LINBIT SDS. Cheers to another 10 years!


Kelsey Swan
Kelsey turns her personal passion for connecting with people into a supporting LINBIT clients. As the Accounts Manager for LINBIT USA, Kelsey engages with customers to provide them with the best experience possible. From Enterprise companies, to Mom and Pop shops, Kelsey ensures the implementation of LINBIT products goes smoothly. Doing what is best for the client is her #1 priority.
bandwidth-close-up-computer-1148820

Replicating storage volumes on Scaleway ARM with LINSTOR

I’ve been using Scaleway for a while as a platform to spin-up both personal and work machines, mainly because they’re good value and easy to use. Scaleway offers a wide selection of Aarch64 and x86 machines at various price points, however none of these VMs are replicated – not even with RAID at the hardware level – you’re expected to handle that all yourself. Since ARM servers have been making headlines for several years as a competing architecture to x86 in the data center, I thought it would be interesting to set up replication across two ARM Scaleway VMs with DRBD and LINSTOR.

It’s worth pointing out here that if you’re planning on building a production HA environment on Scaleway, you should also reach out to their support team and have them confirm that your replicated volumes aren’t actually sitting on the same spinning disk in case of drive failure, as advised in their FAQ.

Preparing VMs

Linstor scaleway drbd-arm 6

First, we need a couple of VMs with additional storage volumes to replicate. The ARM64-2GB VM doesn’t allow for mounting additional volumes, so let’s go for the next one up, and add an additional 50GB LSSD volume.

Linstor scaleway drbd-arm 2

I’ve gone with an Ubuntu image, if you selected an RPM-based image, substitute package manager commands accordingly. I want to run the following commands on all VMs (in my case I have two, and will be using the first as both my controller and also a satellite node).

$ sudo apt update && sudo apt upgrade

In this case we’ll be deploying DRBD nodes with LINSTOR. We need DRBD9 to do this, but we can’t build a custom kernel module without first getting some prerequisite files for Scaleway’s custom kernel and preparing for a custom kernel module build. Scaleway provides a recommended script to run – we need to save that script and run it before installing DRBD9. I’ve put it in a file on github to make things simple:

$ sudo apt install -y build-essential libssl-dev
$ wget https://raw.githubusercontent.com/dabukalam/scalewaycustommodule/master/scalewaycustommodule
$ chmod +x scalewaycustommodule && sudo ./scalewaycustommodule

Getting LINSTOR

Once that’s done, we can add the LINBIT community repository and install DRBD, LINSTOR, and LVM:

$ sudo add-apt-repository -y ppa:linbit/linbit-drbd9-stack
$ sudo apt update
$ sudo apt install drbd-dkms linstor-satellite linstor-client lvm2

Now I can start the LINSTOR satellite service with:

$ sudo systemctl enable --now linstor-satellite

And make sure the VMs can see each other by adding the other node to each hosts file:

Linstor scaleway drbd-arm 3

Let’s make sure LVM is running and create a volume group for LINSTOR on our additional volume:

$ systemctl enable --now lvm2-lvmetad.service
$ systemctl enable --now lvm2-lvmetad.socket
$ sudo vgcreate sw_ssd /dev/vdb

That’s it for commands you need to run on both nodes. From now on we’ll be running commands on our favorite VM. LINSTOR has four node types – Controller, Auxiliary, Combined, and Satellite. Since I only have two nodes, one will be Combined, and one will be a Satellite. Combined here means that the node is both a Controller and a Satellite.

Adding nodes to the LINSTOR cluster

So on our favorite VM, which we’re going to use as the combined node, we add the local host to the LINSTOR cluster as a combined node, and the other as a satellite:

$ sudo apt install -y linstor-controller
$ sudo systemctl enable --now linstor-controller
$ linstor node create --node-type Combined drbd-arm 10.10.43.13
$ linstor node create --node-type Satellite drbd-arm-2 10.10.25.5
$ linstor node list

It’s worth noting here that you can run commands to manage LINSTOR on any node, just make sure you have the controller node exported as a variable

drbd-arm-2:~$ export LS_CONTROLLERS=drbd-arm

You should now have something that looks like this:

Linstor scaleway drbd-arm 4

Now we have our LINSTOR cluster setup, we can create a storage-pool across the nodes with the same name ‘swpool’, referencing the node name, specifying we want lvm, and the volume group name:

$ linstor storage-pool create drbd-arm swpool lvm sw_ssd
$ linstor storage-pool create drbd-arm-2 swpool lvm sw_ssd

We can then define new resource and volume types, and use them to create the resource. You can perform a whole range of operations at this point including manual node placement and specifying storage pools. Since we only have one storage pool, LINSTOR will automatically select that for us. I only have two nodes so I’ll just autoplace my storage cluster across two.

$ linstor resource-definition create backups
$ linstor volume-definition create backups 40G
$ linstor resource create backups --auto-place 2

LINSTOR will now handle all the resource creation automagically across all our nodes, including dealing with LVM and DRBD. If all succeeds, you should now be able to see your resources. They’ll be inconsistent while DRBD syncs them up. You can also now see the DRBD resources by running drbdmon. Once it’s finished syncing you’ll see a list of your replicated nodes as below (only drbd-arm-2 in my case):

You can now mount the drive on any of the nodes and write to your new replicated storage cluster.

$ linstor resource list-volumes

Linstor scaleway drbd-arm 7

In this case the device name is /dev/drbd1000, so once we create a filesystem on it and mount it I can now write to my new new replicated storage cluster.

$ sudo mkfs /dev/drbd1000
$ sudo mount /dev/drbd1000 /mnt
$ sudo touch /mnt/file

 

 

Danny Abukalam on Linkedin
Danny Abukalam
Danny is a Solutions Architect at LINBIT based in Manchester, UK. He works in conjunction with the sales team to support customers with LINBIT's products and services. Danny has been active in the OpenStack community for a few years, organising events in the UK including the Manchester OpenStack Meetup and OpenStack Days UK. In his free time, Danny likes hunting for extremely hoppy IPAs and skiing, not at the same time.

A Highly Available LINSTOR Controller for Proxmox

For the High Availability setup we describe in this blog post, we assume that you installed LINSTOR and the Proxmox Plugin as described in the Proxmox section of the users guide or our blog post.

The idea is to execute the LINSTOR controller within a VM that is controlled by Proxmox and its HA features, where the storage resides on DRBD, managed by LINSTOR itself.

Preparing the Storage

The first step is to allocate storage for the VM by creating a VM and selecting “Do not use any media” on the “OS” section. The hard disk should reside on DRBD (e.g., “drbdstorage”). Disk space should be at least 2GB, and for RAM we chose 1GB. These are the minimal requirements for the appliance LINBIT provides to its customers (see below). If you set up your own controller VM, or resources are not constrained, increase these minimal values. In the following, we assume that the controller VM was created with ID 100, but it is fine if this VM is created later (after you have already created other VMs).

LINSTOR Controller Appliance

LINBIT provides an appliance for its customers that can be used to populate the created storage. For the appliance to work, we first create a “Serial Port.” First, click on “Hardware” and then on “Add” and finally on “Serial Port.” See image below:

proxmox_serial1_controller_vm

If everything worked as expected, the VM definition should then look like this:

proxmox_add_serial2_controller_vm

The next step is to copy the VM appliance to the created storage. This can be done with qemu-img. Make sure to replace the VM ID with the correct one:

# qemu-img dd -O raw if=/tmp/linbit-linstor-controller-amd64.img \
 of=/dev/drbd/by-res/vm-100-disk-1/0

After that, you can start the VM and connect to it via the Proxmox VNC viewer. The default user name and password are both “linbit”. Note that we kept the defaults for SSH, so you will not be able to log in to the VM via SSH and username/password. If you want to enable that (and/or “root” login), enable these settings in /etc/ssh/sshd_config and restart the ssh service. As this VM is based on “Ubuntu Bionic”, you should change your network settings (e.g., static IP) in /etc/netplan/config.yaml. After that you should be able to ssh to the VM:

proxmox_ssh_controller_vm

Adding the Controller VM to the existing Cluster

In the next step, you add the controller VM to the existing cluster:

# linstor node create --node-type Controller \
 linstor-controller 10.43.7.254

As this special VM will be not be managed by the Proxmox Plugin, make sure all hosts have access to that VM’s storage. In our test cluster, we checked the linstor resource list to confirm where the storage was already deployed and then created further assignments via linstor resource create. In our lab consisting of four nodes, we made all resource assignments diskful, but diskless assignments are fine as well. As a rule of thumb keep the redundancy count at “3” (more usually does not make sense), and assign the rest diskless.

As the storage for this particular VM has to be made available (i.e., drbdadm up), enable the drbd.service on all nodes:

# systemctl enable drbd
# systemctl start drbd

At startup, the `linstor-satellite` service deletes all of its resource files (*.res) and regenerates them. This conflicts with the drbd services that needs these resource files to start the controller VM. It is good enough to first bring up the resources via drbd.service and then start linstor-satellite.service. To make the necessary changes, you need to create a drop-in for the linstor-satellite.service via systemctl (do
not edit the file directly).

# systemctl edit linstor-satellite
[Unit]
After=drbd.service

Switching to the New Controller

Now, it is time for the final steps — namely switching from the existing controller to the new one in the VM. Stop the old controller service on the old host, and copy the LINSTOR controller database to the VM:

# systemctl stop linstor-controller
# systemctl disable linstor-controller
# scp /var/lib/linstor/* [email protected]:/var/lib/linstor/

Finally, we can enable the controller in the VM:

# systemctl start linstor-controller # in the VM
# systemctl enable linstor-controller # in the VM

To check if everything worked as expected, you can query the cluster nodes on a host by asking the controller in the VM: linstor --controllers=10.43.7.254 node list. It is perfectly fine that the controller (which is just a controller and not “combined”) is shown as “OFFLINE”. Still, this might change in the future to something more appropriate.

As the last – but crucial – step, you need to add the “controllervm” option to /etc/pve/storage.cfg, and change the controller IP:

drbd: drbdstorage
  content images,rootdir
  redundancy 3
  controller 10.43.7.254
  controllervm 100

By setting the “controllervm” parameter the plugin will ignore (or act accordingly) if there are actions on the controller VM. Basically, this VM should not be managed by the plugin, so the plugin mainly ignores all actions on the given controller VM ID. However, there is one exception. When you delete the VM in the GUI, it is removed from the GUI. We did not find a way to return/kill it in a way that would keep the VM in the GUI. Yet such requests are ignored by the plugin, so the VM will not be deleted from the LINSTOR cluster. Therefore, it is possible to later create a VM with the ID of the old controller. The plugin will just return “OK”, and the old VM with the old data can be used again. To keep it simple, be careful to not delete the controller VM.

Enabling HA for the Controller VM in Proxmox

Currently, we have the controller executed as VM, but we should make sure that one instance of the VM is started at all times. For that we use Proxmox’s HA feature. Click on the VM; then on “More”; and then on “Manage HA.” We set the following parameters for our controller VM:

promox_manage_ha_controller_vm

Final Considerations

As long as there are surviving nodes in your Proxmox cluster, everything should be fine. In case the node hosting the controller VM is shut down or lost, Proxmox HA will make sure the controller is started on another host. The IP of the controller VM should not change. It is up to you as admin to make sure this is the case (e.g., setting a static IP, or always providing the same IP via dhcp on the bridged interface).

One limitation that is not fully handled with this setup is a total cluster outage (e.g., common power supply failure) with a restart of all cluster nodes. Proxmox is unfortunately pretty limited in that regard. You can enable the “HA Feature” for a VM, and you can define “Start and Shutdown Order” constraints. But both are completely separated from each other. Therefore it is difficult to ensure that the controller VM is up and all other VMs are started.

It might be possible to work around that by delaying VM startup in the Proxmox plugin until the controller VM is up (i.e., if the plugin is asked to start the controller VM it does it, otherwise it waits and pings the controller). While this is a nice idea, it would be a huge failure in a serialized, non-concurrent VM start/plugin call event stream where some VM should be started (which then blocks) before the controller VM is scheduled to be started. That would obviously result in a deadlock.

We will discuss options with Proxmox, but we think the presented solution is valuable in typical use cases as is, especially compared to the complexity of a Pacemaker setup. Use cases where one can expect that not the whole cluster goes down at the same time are (will be??) covered. And even if that is the case, only automatic startup of the VMs would not work when the whole cluster is started. In such a scenario, the admin just has to wait until the Proxmox HA service starts the controller VM. After that, all VMs can be started manually/scripted on the command line.

Roland Kammerer
Software Engineer at Linbit
Roland Kammerer studied technical computer science at the Vienna University of Technology and graduated with distinction. Currently, he is a PhD candidate with a research focus on time-triggered realtime-systems and works for LINBIT in the DRBD development team.

 

art-bridge-linstor-proxmox-plugin

How to setup LINSTOR on Proxmox VE

In this technical blog post, we show you how to integrate DRBD volumes in Proxmox VE via a storage plugin developed by LINBIT. The advantages of using DRBD include a configurable number of data replicas (e.g., 3 copies in a 5 node cluster), access to the data on every node and therefore very fast VM live-migrations (usually takes only a few seconds, depending on memory pressure). Download Linstor Proxmox Plugin

Setup

The rest of this post assumes that you have already set up Proxmox VE (the LINBIT example uses 4 nodes), and have created a PVE cluster consisting of all nodes. While this post is not meant to  replace the DRBD User’s Guide, we try to show a complete setup.

The setup consists of two important components:

  1. LINSTOR manages DRBD resource allocation
  2. linstor-proxmox plugin that implements the Proxmox VE storage plugin API and executes LINSTOR commands.

In order for the plugin to work, you must first create a LINSTOR cluster.

LINSTOR Cluster

We have assumed here that you have already set up the LINBIT Proxmox repository as described in the User’s guide. If you have not completed this set up, execute the following commands on all cluster nodes. First, we need the low-level infrastructure (i.e., the DRBD9 kernel module and drbd-utils):

apt install pve-headers
apt install drbd-dkms drbd-utils
rmmod drbd; modprobe drbd
grep -q drbd /etc/modules || echo "drbd" >> /etc/module

The next step is to install LINSTOR:

apt install linstor-controller linstor-satellite linstor-client
systemctl start linstor-satellite
systemctl enable linstor-satellite

Now, decide which of your hosts should be the current controller node and enable the linstor-controller service on that particular node only:

systemctl start linstor-controller

Volume creation

Obviously, DRBD needs storage to create volumes. In this post we assume a setup where all nodes contain an LVM-thinpool called drbdpool. In our sample setup, we created it on the pve volume group, but in your setup, you might have a different storage topology. On the node that runs the controller service, execute the following commands to add your nodes:

linstor node create alpha 10.0.0.1 --node-type Combined
linstor node create bravo 10.0.0.2 --node-type Combined
linstor node create charlie 10.0.0.3 --node-type Combined
linstor node create delta 10.0.0.4 --node-type Combined

“Combined” means that this node is allowed to execute a LINSTOR controller and/or a satellite, but a node does not have to execute both. So it is safe to specify “Combined”; it does not influence the performance or the number of services started.

The next step is to configure a storage pool definition. As described in the User’s guide, most LINSTOR objects consist of a “definition” and then concrete instances of such a definition:

linstor storage-pool-definition create drbdpool

By now it is time to mention that the LINSTOR client provides handy shortcuts for its sub-commands. The previous command could have been written as linstor spd c drbdpool. The next step is to register every node’s storage pool:

for n in alpha bravo charlie delta; do \
linstor storage-pool create $n drbdpool lvmthin pve/drbdpool; \
done

DRBD resource creation

After that we are ready to create our first real DRBD resource:

linstor resource-definition create first
linstor volume-definition create first 10M --storage-pool drbdpool
linstor resource create alpha first
linstor resource create bravo first

Now, check with drbdadm status that  “alpha” and “bravo” contain a replicated DRBD resource called “first”. After that this dummy resource can be deleted on all nodes by deleting its resource definition:

linstor resource-definition delete -q first

LINSTOR Proxmox VE Plugin Setup

As DRBD and LINSTOR are already set up, the only things missing is installing the plugin itself and its configuration.

apt install linstor-proxmox

The plugin is configured via the file /etc/pve/storage.cfg:

drbd: drbdstorage
content images, rootdir
redundancy 2 controller 10.0.0.1

It is not necessary to copy that file to the other nodes, as /etc/pve is already a replicated file system. After the configuration is done, you should restart the following service:

systemctl restart pvedaemon

After this setup is done, you are able to create virtual machines backed by DRBD from the GUI. To do so, select “drbdstorage” as storage in the “Hard Disk” section of the VM. LINSTOR selects the nodes that have the most free storage to create the replicated backing devices.

Distribution

The interested reader can check which ones were selected via LINSTOR resource list. While interesting, it is important to know that the storage can be accessed by all nodes in the cluster via a DRBD feature called “diskless clients”. So let’s assume “alpha” and “bravo” had the most free space and were selected, and the VM was created on node “bravo”. Via the low level tool drbdadm status we now see that the resource is created on two nodes (i.e., “alpha” and “bravo”) and the DRBD resource is in “Primary” role on “bravo”.

Now we want to migrate the VM from “bravo” to node “charlie”. This is again done via a few clicks in the GUI, but the interesting steps happen behind the scene: The storage plugin realizes that it has access to the data on “alpha” and “bravo” (our two replicas) but also needs access on “charlie” to execute the VM. The plugin therefore creates a diskless assignment on “charlie”. When you execute drbdadm status on “charlie”, you see that now three nodes are involved in the overall picture:

• Alpha with storage in Secondary role
• Bravo with storage in Secondary role
• Charlie as a diskless client in Primary role

Diskless clients are created (and deleted) on demand without further user interaction, besides moving around VMs in the GUI. This means that if you now move the VM back to “bravo”, the diskless assignment on “charlie” gets deleted as it is no longer needed.

If you would have moved the VM from “charlie” to “delta”, the diskless assignment for “charlie” would have been deleted, and a new one for “delta” would have been created.

For you it is probably even more interesting that all of this including VM migration happens within seconds without moving the actual replicated storage contents.

Next Steps

So far, we created a replicated and highly-available setup for our VMs, but the LINSTOR controller and especially its database are not highly-available. In a future blog post, we will describe how to make the controller itself highly-available by only using software already included in Proxmox VE (i.e., without introducing complex technologies like Pacemaker). This will be achieved with a dedicated controller VM that will be provided by LINBIT as an appliance.

Roland Kammerer
Software Engineer at Linbit
Roland Kammerer studied technical computer science at the Vienna University of Technology and graduated with distinction. Currently, he is a PhD candidate with a research focus on time-triggered realtime-systems and works for LINBIT in the DRBD development team.
X