fbpx
man-person-jumping-desert

LINSTOR grows beyond DRBD

For quite some time, LINSTOR has been able to use NVMe-oF storage targets via the Swordfish API. This was expressed in LINSTOR as a resource definition that contains a single resource with one backing disk (that is the NVMe-oF target) and one diskless resource (that is the NVMe-oF initiator).

Layers in the storage stack

In the last few months the team has been busy making LINSTOR more generic, adding support for resource templates. A resource template describes a storage stack in terms of layers for specific resources/volumes. Here are some examples of such storage stacks:

    • DRBD on top of logic volumes (LVM)
    • DRBD on top of zVols (ZFS)
    • Swordfish initiator & target on top of logic volumes (LVM)
    • DRBD on top of LUKS on top of logic volumes (LVM)
  • LVM only

The team came up with an elegant approach that introduces these additional resource templates in ways that allow existing LINSTOR configurations to keep their semantics as the default resource templates.

With this decoupling, we no longer need to have DRBD installed on LINSTOR clusters that do not require the replication functions of DRBD.

What does that mean for DRBD?

The interests of LINBIT’s customers vary widely. Some want to use LINSTOR without DRBD – which is now supported. A very prominent example of this is Intel, who uses LINSTOR in its Rack Scale Design effort to connect storage nodes and compute nodes with NVMe-oF. In this example, the storage is disaggregated from the other nodes.

Other customers see converged architectures as a better fit. For converged scenarios, DRBD has many advantages over a pure data access protocol such as NVMe-oF. LINSTOR is built from the ground up to manage DRBD, therefore, the need for DRBD support will remain.

Linux-native NVMe-oF and NVMe/TCP

SNIA’s Swordfish has clear benefits with creating a standard for managing storage targets such as allowing optimized storage target implementations, as well as a hardware-accelerated data-path, non-Linux control path.

Due to the fact that Swordfish is an extension of Redfish, which needs to be implemented in the Baseboard Management Controller (BMC), we have decided to extend LINSTOR’s driver set to configure NVMe-oF target and initiator software. We do this by utilizing existing tools found within the Linux operating system, eliminating the need for a Swordfish software stack.

Summary

LINSTOR now supports configurations without DRBD. It is now a unified storage orchestrator for replicated and non-replicated storage.

Philipp Reisner on Linkedin
Philipp Reisner
Philipp Reisner is founder and CEO of LINBIT in Vienna/Austria. His professional career has been dominated by developing DRBD, a storage replication for Linux. Today he leads a company of about 30 employees with locations in Vienna/Austria and Portland/Oregon.
Nvme-oF-Linstor-speed

Speed Up! NVMe-oF for LINSTOR

What is NVMe?

The storage world has gained a number of new terms in the last few years. Let’s start with NVMe. The abbreviation stands for Non-Volatile Memory express, which isn’t very self-explanatory. It all began a few years back when NAND Flash started to make major inroads into the storage industry, and the new storage medium needed to be accessed through existing interfaces like SATA and Serial attached SCSI (SAS).

Back at that time, FusionIO created a NAND flash-based SSD that was directly plugged into the PCIe slot of a server. This eliminated the bottleneck of the ATA or SCSI command sets and the interfaces coming from a time of rotating storage media.

The FusionIO products shipped with proprietary drivers, and the industry set forth in creating an open standard that suits the performance of NAND flash. One of the organizations where the players of the industry can meet, align and create standards is the Storage Networking Industry Association ( SNIA).

The first NVMe standard was published in 2013, and it describes a PCIe-based interface and command set to access fast storage. This can be thought of as a cleaned up version of the ATA or SCSI commands plus a PCIe interface.

What is NVMe-oF and NVMe/TCP?

Similar to what iSCSI is to SCSI, NVMe-oF or NVMe/TCP are standards that describe how to send the NVMe commands over networks. NVMe-oF requires a RDMA-capable network (like InfiniBand or RoCE), and NVMe/TCP works on every network that can carry IP traffic.

There are two terms of which to be aware: 1) the initiator is where the applications run that want to access the dataset. Linux comes with a built-in initiator, likewise other OSes already have initiators or will have them soon.

And, 2) the target is where the data is stored. Linux comes with a software target built into the kernel. It might not be obvious that any Linux block device can be made available as a NVMe-oF target using the Linux target software. It is not limited to NVMe devices.

What does this have to do with Swordfish?

While the iSCSI or NVMe-oF standards describe how the READ, WRITE and other operations on block data are shipped from the initiator to the target, they do not describe how a target (volume) gets created or configured. For too many years, this was the realm of vendor specific APIs and GUIs.

SNIA’s Swordfish standard describes how to manage storage targets and make it accessible as NVMe-oF targets. It is a REST API with JSON data. As such, it is easy to understand and embrace.

The major drawback of Swordfish is mainly that it is defined as an extension of Redfish. Redfish is a standard to manage servers over the network. It can be thought of as a modernized IPMI. As such, Redfish will usually be implemented on a Baseboard Management Controller (BMC). While Redfish has its advantages over IPMI, it does not provide something completely new.

On the other hand, Swordfish is something that was not there before, but as it is an extension to Redfish, an implementation of it usually means that the BMC of the machine needs to have a Redfish-enabled BMC, which may hinder or slow down the adoption of Swordfish.

LINSTOR

Since version 0.7, LINSTOR is capable of working with storage provided by Swordfish-compliant storage targets, as well as their initiator counterparts.

Summary

LINSTOR has gained the capability of managing storage on Swordfish/NVMe-oF targets besides working with DRBD and direct attached storage on Linux servers.

Philipp Reisner on Linkedin
Philipp Reisner
Philipp Reisner is founder and CEO of LINBIT in Vienna/Austria. His professional career has been dominated by developing DRBD, a storage replication for Linux. Today he leads a company of about 30 employees with locations in Vienna/Austria and Portland/Oregon.

LINSTOR High Level Resource API

High Level Resource API – The simplicity of creating replicated volumes

In this blog post, we present one of our recent extensions to the LINSTOR ecosystem: A high-level, user-friendly Python API that allows simple DRBD resource management via LINSTOR.

Background: So far LINSTOR components communicated by the following means: Via Protocol Buffers, or via the Python API that is used in the linstor command line client. Protocol Buffers are a great way to transport serialized structured data between LINSTOR components, but by themselves they don’t provide the necessary abstraction for developers.

That is not the job of Protocol Buffers. Since the early days we split the command line client into the client logic (parsing configuration files, parsing command line arguments…), and a Python library (python-linstor). This Python library provides all the bits and pieces to interact with LINSTOR. For example it provides a MultiLinstor class that handles TCP/IP communication to the LINSTOR controller. Additionally, it allows all the operations that are possible with LINSTOR (e.g. creating nodes, creating storage pools…). For perfectly valid reasons this API is very low level and pretty close to the actual Protocol Buffer messages sent to the LINSTOR controller.

By developing more and more plugins to integrate LINSTOR into other projects like OpenStack, OpenNebula, Docker Volumes, and many more, we saw that there is need for a higher level abstraction.

Finding the Right Abstraction

The first dimension of abstraction is to abstract from LINSTOR internals. For example it perfectly makes sense that recreating an existing resource is an error on a low level (think of it as EEXIST). On a higher level, depending on the actual object, trying to recreate an object might be perfectly fine and one wants to get the existing object (i.e. idem-potency).

The second dimension of abstraction is from DRBD and LINSTOR as a whole. Developers dealing with storage already have a good knowledge about concepts like nodes, storage pools, resource, volumes, placement policies… This is the part where we can make LINSTOR and DRBD accessible for new developers.

The third goal was to only provide a set of objects that are important in the context of the user/developer. This, for example, means that we can assume that the LINSTOR cluster is already set up, so we do not need to provide a high-level API to add nodes. For the higher-level API we can focus on [LINSTOR] resources. This allows us to satisfy the KISS (keep-it-simple-stupid) principle. A forth goal was to introduce new, higher-level concepts like placement policies. Placement policies/templates are concepts currently developed in core LINSTOR, but we can already provide basics on a higher level.

Demo Time

We start by creating a 10 GB big replicated LINSTOR/DRBD volume in a 3 node cluster. We want the volume to be 2 times redundant. Then we increase the size of the volume to 20 GB.

$ python
>> import linstor

>> foo = linstor.Resource('foo')

>> foo.volumes[0] = linstor.Volume("10 GB")

There are multiple ways to specify the size.

>> foo.placement.redundancy = 2

>> foo.autoplace()

>> foo.volumes[0].size += 10 * (2 ** 30)

This line is enough to resize a replicated volume cluster wide.

We needed 5 lines of code to create a replicated DRBD volume in a cluster! Let that sink in for a moment and compare it to the steps that were necessary without LINSTOR: Creating backing devices on all nodes, writing and synchronizing DRBD res(ource) files, creating meta-data on all nodes, drbdadm up the resource and force one to the Primary role to start the initial sync.

For the next step we assume that the volume is replicated and that we are a storage plugin developer. Our goal is to make sure the volume is accessible on every node because the block device should be used in a VM. So, A) make sure we can access the block device, and B) find out what the name of the block device of the first volume actually is:

>>> foo.activate(socket.gethostname())

>>> print(foo.volumes[0].device_path)

The method activate is one of these methods that shows how we intended abstraction. Note that we autoplaced the resource 2 times in a 3-node cluster. So LINSTOR chose the nodes that fit best. But now we want the resource to be accessible on every node without increasing the redundancy to 3 (because that would need additional storage and 2 times replicated data is good enough).

Diskless clients

Fortunately DRBD has us covered as it has the concept of diskless clients. These nodes provide a local block device as usual, but they read and write data from/to their peers only over the network (i.e. no local storage). Creating this diskless assignment is not necessary if the node was already part of the replication in the first place (then it already has access to the data locally).

This is exactly what activate does: If the node can already access the data – fine, if not, create a diskless assignment. Now assume we are done and we do not need access to the device anymore. We want to do some cleanup because we do not need a diskless assignment:

>>> foo.deactivate(socket.gethostname()) 

The semantic of this method is to remove the assignment if it is diskless (as it does not contribute to actual redundancy), but if it is a node that stores actual data, deactivate does nothing and keeps the data as redundant as it was. This is only a very small subset of the functionality the high-level API provides, there is a lot more to know like creating snapshots, converting diskless assignments to diskful ones and vice versa, or managing DRBD Proxy. For more information check the online documentation.

If you want to go deeper into the LINSTOR universe, please visit our youtube channel.

Roland Kammerer
Software Engineer at Linbit
Roland Kammerer studied technical computer science at the Vienna University of Technology and graduated with distinction. Currently, he is a PhD candidate with a research focus on time-triggered realtime-systems and works for LINBIT in the DRBD development team.

LINSTOR CSI Plugin for Kubernetes

A few weeks ago, LINBIT publicly released the LINSTOR CSI (Container Storage Interface) plugin. This means LINSTOR now has a standardized way of working with any container orchestration platform that supports CSI. Kubernetes is one of those platforms, so our developers put in the work to make LINSTOR integration with Kubernetes easy, and I’ll show you how!

You’ll need a couple things to get started:

  • Kubernetes Cluster (1.12.x or newer)
  • LINSTOR Cluster

LINSTOR’s CSI plugin requires certain Kubernetes feature gates be enabled on the kube-apiserver and each kubelet.

Enable the CSINodeInfo and CSIDriverRegistry feature gates on the kube-apiserver by adding, --feature-gates=KubeletPluginsWatcher=true,CSINodeInfo=true, to the list of arguments passed to the kube-apiserver system pod in the /etc/kubernetes/manifests/kube-apiserver.yaml manifest. It should look something like this:

# cat /etc/kubernetes/manifests/kube-apiserver.yaml
apiVersion: v1
kind: Pod
metadata:
  annotations:
    scheduler.alpha.kubernetes.io/critical-pod: ""
  creationTimestamp: null
  labels:
    component: kube-apiserver
    tier: control-plane
  name: kube-apiserver
  namespace: kube-system
spec:
  containers:
  - command:
    - kube-apiserver
    ... snip ...
    - --feature-gates=KubeletPluginsWatcher=true,CSINodeInfo=true
    ... snip ...

To enable these feature gates on the Kubelet, you’ll need to add the following argument to the KUBELET_EXTRA_ARGS variable located in the /etc/sysconfig/kubelet: --feature-gates=CSINodeInfo=true,CSIDriverRegistry=true. Your config should look something like this:

# cat /etc/sysconfig/kubelet 
KUBELET_EXTRA_ARGS="--feature-gates=CSINodeInfo=true,CSIDriverRegistry=true"

Once you’ve modified those two configurations, you can prepare your configuration for the CSI plugin’s sidecar containers. curl down the latest version of the plugin definition:

# curl -O \
https://raw.githubusercontent.com/LINBIT/linstor-csi/master/examples/k8s/deploy/linstor-csi.yaml

Set the value: of each instance of LINSTOR-IP in the linstor-csi.yaml to the IP address of your LINSTOR Controller. The placeholder IP in the example yaml is 192.168.100.100, so we can use the following command to update this address (or you can edit it with an editor), simply set CON_IP to your controller’s IP address:

# CON_IP="x.x.x.x"; sed -i.example s/192\.168\.100\.100/$CON_IP/g linstor-csi.yaml

Finally, apply the yaml to the Kubernetes cluster:

# kubectl apply -f linstor-csi.yaml

You should now see the linstor-csi sidecar pods running in the kube-system namespace:

# watch -n1 -d kubectl get pods --namespace=kube-system --output=wide

Once running, you can define storage classes in Kubernetes pointing to our LINSTOR storage pools that we can then provision persistent, and optionally replicated by DRBD, volumes from for our containers.

Here is an example yaml definition that describes a LINSTOR storage pool in my cluster named, thin-lvm:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: linstor-autoplace-1-thin-lvm
provisioner: io.drbd.linstor-csi
parameters:
  autoPlace: "1"
  storagePool: "thin-lvm"

And here is an example yaml definition for a persistent volume claim carved out of the above storage class:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  annotations:
    volume.beta.kubernetes.io/storage-class: linstor-autoplace-1-thin-lvm
  name: linstor-csi-pvc-0
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi

Put it all together and you’ve got yourself an open source, high performance, block device provisioner for your persistent workloads in Kubernetes!

There are many ways to craft your storage class definitions for node selection, storage tiering, diskless attachments, or even off site replicas. We’ll be working on our documentation surrounding new features, so stay tuned, and don’t hesitate to reach out for the most UpToDate information about LINBIT’s software!

Read more: CSI Plugin for LINSTOR Complete.

Matt Kereczman on Linkedin
Matt Kereczman
Matt is a Linux Cluster Engineer at LINBIT with a long history of Linux System Administration and Linux System Engineering. Matt is a cornerstone in LINBIT’s support team, and plays an important role in making LINBIT’s support great. Matt was President of the GNU/Linux Club at Northampton Area Community College prior to graduating with Honors from Pennsylvania College of Technology with a BS in Information Security. Open Source Software and Hardware are at the core of most of Matt’s hobbies.
LINSTOR OpenStack Banner

How to Setup LINSTOR in OpenStack

This post will walk through the installation and setup procedures for deploying LINSTOR for a persistent, replicated, and high-performance source of block storage within DevStack version of OpenStack running on an Ubuntu host. We will refer to this Ubuntu host as the LINSTOR Controller. This setup also requires at least one additional Ubuntu node handling replicated data, and we will refer to this node as the LINSTOR Satellite. You may have more than one satellite nodes for increased redundancy.

Initial Requirement

The LINSTOR driver is a messenger between the underlying DRBD/LINSTOR and OpenStack. Therefore, both DRBD/LINSTOR as well as OpenStack must be pre-installed and configured. Once LINSTOR is installed, each node must be registered with LINSTOR and have a predefined storage pool on a thin LVM volume.

Install DRBD / LINSTOR on OpenStack Cinder node as a LINSTOR Controller node

# First, download and run a python script to enable LINBIT repo
curl -O 'https://my.linbit.com/linbit-manage-node.py'
chmod u+x linbit-manage-node.py
./linbit-manage-node.py

# Install the DRBD, LINSTOR, and LVM packages
sudo apt install -y drbd-dkms lvm2
sudo apt install -y linstor-controller linstor-satellite linstor-client
sudo apt install -y drbdtop

Configure the LINSTOR Controller

# Start both LINSTOR Controller and Satellite Services
systemctl enable linstor-controller.service
systemctl start linstor-controller.service
systemctl enable linstor-satellite.service
systemctl start linstor-satellite.service

# Create backend storage for DRBD/LINSTOR by creating a Volume Group 'drbdpool'
# Specify appropriate volume location (/dev/vdb)
sudo vgcreate drbdpool /dev/vdb

# Create a Logical Volume 'thinpool' within 'drbdpool'
# Specify appropriate thin volume size (64G)
sudo lvcreate -L 64G -T drbdpool/thinpool

Install DRBD / LINSTOR on all other LINSTOR Satellite node(s)

# First obtain and install DRBD / LINSTOR packages through LINBIT
# by running python script
sudo apt install -y drbd-dkms lvm2
sudo apt install -y linstor-satellite
sudo apt install -y drbdtop

Configure the LINSTOR Satellite node(s)

# Start LINSTOR Satellite Service
systemctl enable linstor-satellite.service
systemctl start linstor-satellite.service

# Create backend storage for DRBD/LINSTOR by creating a Volume Group 'drbdpool'
# Specify appropriate volume location (/dev/vdb)
sudo vgcreate drbdpool /dev/vdb

# Create a Logical Volume 'thinpool' within 'drbdpool'
# Specify appropriate thin volume size (64G)
sudo lvcreate -L 64G -T drbdpool/thinpool

Configure LINSTOR cluster (nodes and storage pool definitions) from the Controller node

# Create the controller node as combined controller and satellite node
linstor node create cinder-node-name 192.168.1.100 --node-type Combined

# Create the satellite node(s)
linstor node create another-linstor-node 192.168.1.101
# repeat to add more satellite nodes in the LINSTOR cluster

# Create LINSTOR Storage Pool on each nodes
# For each node, specify node name, its IP address, 
# storage pool name (DfltStorPool) and volume type (lvmthin)

# On the LINSTOR Controller 
linstor storage-pool create lvmthin cinder-node-name DfltStorPool \
    drbdpool/thinpool
# On the LINSTOR Satellite node(s)
linstor storage-pool create lvmthin another-linstor-node DfltStorPool \
    drbdpool/thinpool
# repeat to add a storage pool to each node in the LINSTOR cluster

 

Cinder Driver Installation & Configuration

Download the latest driver (linstordrv.py)

wget https://github.com/LINBIT/openstack-cinder/blob/stein-linstor/cinder/
volume/drivers/linstordrv.py

Install the driver file in the proper destination

/opt/stack/cinder/cinder/volume/drivers/linstordrv.py

Configure OpenStack Cinder by editing /etc/cinder/cinder.conf
to enable LINSTOR driver by adding ‘linstor’ to enabled_backends

[DEFAULT]
...
enabled_backends=lvm, linstor
…

Then, add a LINSTOR section at the bottom of the cinder.conf

[linstor]
volume_backend_name = linstor
volume_driver = cinder.volume.drivers.linstordrv.LinstorDrbdDriver
linstor_default_volume_group_name=drbdpool
linstor_default_uri=linstor://localhost
linstor_default_storage_pool_name=DfltStorPool
linstor_default_resource_size=1
linstor_volume_downsize_factor=4096
linstor_controller_diskless=False
iscsi_helper=tgtadm

Update Python libraries

sudo pip install protobuf --upgrade
sudo pip install eventlet --upgrade

Register LINSTOR with Cinder

cinder type-key linstor
cinder type-key linstor set volume_backend_name=linstor

Lastly, restart Cinder services

sudo systemctl restart [email protected]
sudo systemctl restart [email protected]
sudo systemctl restart [email protected]

 

Verification of proper installation

Check system journal for any driver errors

# Check if there is a recurring error after restart
sudo systemctl -f -u [email protected]* | grep error

Create a test volume with LINSTOR backend

# Create a 1GiB volume through Cinder and verify LINSTOR backing exists
openstack volume create --type linstor --size 1 --availability-zone nova \
    linstor-test-vol
openstack volume list
linstor resource list

Delete the test volume

# Delete the test volume and verify if LINSTOR removed resources correctly
openstack volume delete linstor-test-vol
linstor resource list

 

Final Comments

By now, the LINSTOR driver should have successfully created a Cinder volume and the matching LINSTOR resources on the backend and then removed them from Cinder. From this point on, managing LINSTOR volumes should be a breeze with OpenStack Horizon’s GUI interface.

Management of LINSTOR snapshots and creation of LINSTOR volumes from those snapshots are also possible. Once a LINSTOR volume becomes available, it can then be made accessible within a Nova instance by creating an attachment. Any LINSTOR-backed volume can then provide replicated and persistent storage.

Please direct any questions regarding the specifics about the driver to Woojay Poynter at [email protected]. For any inquiry regarding DRBD and LINSTOR technology please contact our sales team at [email protected].

Feel free to check out this demonstration of LINSTOR volume management in OpenStack:

 

Woojay Poynter
IO Plumber
Woojay is working on data replication and software-defined-storage with LINSTOR, built on DRBD @LINBIT. He has worked on web development, embedded firmwares, professional culinary education, power carving with ice and wood. He is a proud father and likes to play with legos.
windows drbd kernel driver

WinDRBD: DRBD for Windows is Coming

Why you should use LINSTOR in OpenStack

With the LINSTOR volume driver for OpenStack, Linux storage created in OpenStack Cinder can be easily provisioned, managed and seamlessly replicated across a large Linux cluster.

LINSTOR is an open-source storage orchestrator designed to deliver easy-to-use software-defined storage in Linux environments. LINSTOR uses LINBIT’s DRBD to replicate block data with minimal overhead and CPU load. Managing a LINSTOR storage cluster is as easy as a few LINSTOR CLI commands or a few lines of Python code with the LINSTOR API.

LINSTOR pairs with Openstack

OpenStack paired with LINSTOR brings even greater power and flexibility by enabling Linux to become your SDS platform. Replicate storage wherever you need it with simple mouse clicks. Provision snapshots. Create new volumes with those snapshots. LINSTOR volumes can then be paired with the right compute nodes just as easily. Together, OpenStack and LINSTOR bring tremendous potential to provide robust infrastructure with ease, all powered by open-source.

Data replicated with LINSTOR can minimize downtime and data loss. Running your cloud on commodity hardware with the native Linux features underneath provides the most flexible, reliable, and cost-effective solution to hosting customized OpenStack deployment anywhere.

In addition to storage management and replication, LINBIT also offers Geo-Clustering solutions that work with LINSTOR to enable long-distance data replication inside private and public cloud environments.

For a quick recap, please check out this video on deploying LINSTOR volumes with OpenStack’s Horizon GUI.

More information about LINBIT’s DRBD and LINSTOR visit:

For LINSTOR OpenStack Drivers
https://github.com/LINBIT/openstack-cinder

For LINSTOR Driver Documentation:
https://docs.linbit.com/docs/users-guide-9.0/#ch-openstack-linstor

For LINBIT’s LINSTOR webpage:
https://www.linbit.com/en/linstor/

 

Woojay Poynter
IO Plumber
Woojay is working on data replication and software-defined-storage with LINSTOR, built on DRBD @LINBIT. He has worked on web development, embedded firmwares, professional culinary education, power carving with ice and wood. He is a proud father and likes to play with legos.

LINSTOR is officially part of OpenStack

It’s Official. LINSTOR Volume Driver is Now a Part of OpenStack Cinder.

It’s Official. LINSTOR volume driver is now part of OpenStack.

With this code merge, LINSTOR volume driver is now officially part of OpenStack and brings a new level of software-defined-storage (SDS) service to Cinder, the OpenStack’s volume service. 

While the next OpenStack release named ‘Stein’ won’t be out until April, the latest LINSTOR driver is already available on our GitHub repo.

Stay tuned for more news and updates from LINBIT.

Plugin for Linstor with OpenNebula

How to Setup LINSTOR with OpenNebula

This post will guide you through the setup of the LINSTOR – OpenNebula Addon. After completing it, you will be able to easily live-migrate virtual machines between OpenNebula nodes, and additionally, have data redundancy.

Setup Linstor with OpenNebula

This post assumes that you already have OpenNebula installed and running on all of your nodes. At first I will give you a quick guide for installing LINSTOR, for a more detailed documentation please read the DRBD User’s Guide. The second part will show you how to add a LINSTOR image and system datastore to OpenNebula.

We will assume the following node setup:

Node name IP Role
alpha 10.0.0.1 Controller/ON front-end
bravo 10.0.0.2 Virtualization host
charlie 10.0.0.3 Virtualization host
delta 10.0.0.4 Virtualization host

Make sure you have configured 2 lvm-thin storage pools named linstorpool/thin_image and linstorpool/thin_system on all of your nodes.

LINSTOR setup

Install LINSTOR packages

The easiest setup is to install the linstor-controller on the same node as the OpenNebula cloud front-end. The linstor-opennebula package contains our OpenNebula driver, and therefore, is essential on the OpenNebula cloud front-end node. On this node install the following packages:

apt install drbd-dkms drbd-utils python-linstor linstor-satellite linstor-client linstor-controller linstor-opennebula

After the installation completes start the linstor-controller and enable the service:

systemctl start linstor-controller
systemctl enable linstor-controller

On all other virtualization nodes you do not need the linstor-controllerlinstor-client or linstor-opennebula package:

apt install drbd-dkms drbd-utils python-linstor linstor-satellite

For all nodes (including the controller) you have to start and enable the linstor-satellite:

systemctl start linstor-satellite
systemctl enable linstor-satellite

Now all LINSTOR-related services should be running.

Adding and configuring LINSTOR nodes

All nodes that should work as virtualization nodes need to be added to LINSTOR, so that storage can be distributed and activated on all nodes:

linstor node create alpha 10.0.0.1 --node-type Combined
linstor node create bravo 10.0.0.2
linstor node create charlie 10.0.0.3
linstor node create delta 10.0.0.4

Now we will configure the system and image lvm-thin pools with LINSTOR:

linstor storage-pool create lvmthin alpha open_system linstorpool/thin_system
linstor storage-pool create lvmthin bravo open_system linstorpool/thin_system
linstor storage-pool create lvmthin charlie open_system linstorpool/thin_system
linstor storage-pool create lvmthin delta open_system linstorpool/thin_system

linstor storage-pool create lvmthin alpha open_image linstorpool/thin_image
linstor storage-pool create lvmthin bravo open_image linstorpool/thin_image
linstor storage-pool create lvmthin charlie open_image linstorpool/thin_image
linstor storage-pool create lvmthin delta open_image linstorpool/thin_image

For testing we can now try to create a dummy test resource:

linstor resource-definition create dummy
linstor volume-definition create dummy 10M
linstor resource create dummy --auto-place 3 -s open_image

If everything went fine with the above commands you should be able to see a resource created on 3 nodes using our default lvm-thin storage pool:

linstor resource list-volumes

Now we can delete the created dummy resource:

linstor resource-definition delete dummy

LINSTOR is now setup and ready to be used by OpenNebula.

OpenNebula LINSTOR datastores

OpenNebula uses different types of datastores: system, image and files.

LINSTOR supports the system and image datastore types.

  • System datastore is used to store a small context image that stores all information needed to run a virtual machine (VM) on a node.
  • Image datastore as it name reveals stores VM images.

OpenNebula doesn’t need to be configured with a LINSTOR system datastore; it will also work with its default system datastore, but using LINSTOR system datastore gives it some data redundancy advantages.

Setup LINSTOR datastore drivers

As LINSTOR is an addon driver for OpenNebula, the LINSTOR OpenNebula driver needs to be added to it, to do so you need to modify the /etc/one/oned.conf and add linstor to the TM_MAD and DATASTORE_MAD sections.

TM_MAD = [
  executable = "one_tm",
  arguments = "-t 15 -d dummy,lvm,shared,fs_lvm,qcow2,ssh,vmfs,ceph,linstor"
]

Note that for the DATASTORE_MAD section the linstor driver has to specified 2 times (image datastore and system datastore).

DATASTORE_MAD = [
    EXECUTABLE = "one_datastore",
    ARGUMENTS  = "-t 15 -d dummy,fs,lvm,ceph,dev,iscsi_libvirt,vcenter,linstor -s shared,ssh,ceph,fs_lvm,qcow2,vcenter,linstor"
]

And finally at the end of the configuration file, add new TM_MAD_CONF and DS_MAD_CONF sections for the linstor driver:

TM_MAD_CONF = [
    name = "linstor", ln_target = "NONE", clone_target = "SELF", shared = "yes", ALLOW_ORPHANS="yes"
]

DS_MAD_CONF = [
    NAME = "linstor", REQUIRED_ATTRS = "BRIDGE_LIST", PERSISTENT_ONLY = "NO", MARKETPLACE_ACTIONS = "export"
]

Now restart the OpenNebula service.

Adding LINSTOR datastore drivers

After we registered the LINSTOR driver with OpenNebula we can add the image and system datastore.

For the system datastore we will create a configuration file and add it with the onedatastore tool. If you want to use more than 2 replicas, just edit the LINSTOR_AUTO_PLACE value.

cat >system_ds.conf <<EOI
NAME = linstor_system_auto_place
TM_MAD = linstor
TYPE = SYSTEM_DS
LINSTOR_AUTO_PLACE = 2
LINSTOR_STORAGE_POOL = "open_system"
BRIDGE_LIST = "alpha bravo charlie delta"
EOI

onedatastore create system_ds.conf

And we do nearly the same for the image datastore:

cat >image_ds.conf <<EOI
NAME = linstor_image_auto_place
DS_MAD = linstor
TM_MAD = linstor
TYPE = IMAGE_DS
DISK_TYPE = BLOCK
LINSTOR_AUTO_PLACE = 2
LINSTOR_STORAGE_POOL = "open_image"
BRIDGE_LIST = "alpha bravo charlie delta"
EOI

onedatastore create image_ds.conf

Now you should see 2 new datastores in the OpenNebula web front-end that are ready to use.

Usage and Notes

The new datastores can be used in the usual OpenNebula datastore selections and should support all OpenNebula features.

The LINSTOR datastores have also some configuration options that are described on the drivers github repository page.

Data distribution

The interested reader can check which ones were selected via LINSTOR resource list.

linstor resource list

While interesting, it is important to know that the storage can be accessed by all nodes in the cluster via a DRBD feature called “diskless clients”. So let’s assume “alpha” and “bravo” had the most free space and were selected, and the VM was created on node “bravo”. Via the low level tool drbdadm status we now see that the resource is created on two nodes (i.e., “alpha” and “bravo”) and the DRBD resource is in “Primary” role on “bravo”.

Now we want to migrate the VM from “bravo” to node “charlie”. This is again done via a few clicks in the GUI, but the interesting steps happen behind the scenes: The storage plugin realizes that it has access to the data on “alpha” and “bravo” (our two replicas) but also needs access on “charlie” to execute the VM. The plugin therefore creates a diskless assignment on “charlie”. When you execute drbdadm status on “charlie”, you see that now three nodes are involved in the overall picture:

  • Alpha with storage in Secondary role
  • Bravo with storage in Secondary role
  • Charlie as a diskless client in Primary role

Diskless clients are created (and deleted) on demand without further user interaction, besides moving around VMs in the GUI. This means that if you now move the VM back to “bravo”, the diskless assignment on “charlie” gets deleted as it is no longer needed.

If you would have moved the VM from “charlie” to “delta”, the diskless assignment for “charlie” would have been deleted, and a new one for “delta” would have been created.

For you it is probably even more interesting that all of this including VM migration happens within seconds without moving the actual replicated storage contents.

Check this for LINSTOR and OpenNebula:

 

 

Rene Peinthor
Software developer
Rene was one of the first developers seeing a DRBD resource deployed by LINSTOR and is software developer at LINBIT since 2017.
While not squashing bugs in LINSTOR, Rene is either climbing or paragliding down a mountain.
Server Maintenance

Minimize Downtime During Maintenance

System maintenance, whether planned or in response to failure, is a necessary part of managing infrastructure. Everyone hopes for the former, rather than the latter. We do our system maintenance quarterly here at LINBIT in hopes that the latter is avoided. These maintenance windows are where we install hardware and software updates, test failovers, and give everything a once over to ensure configurations still make sense.

Normally in the case of planned maintenance, users are left waiting for access while IT does whatever they need to do. This leads to a bad user experience. In fact, that is precisely what lead to this blog post. I was looking for a BIOS update for a motherboard in the server room and was presented with this lovely message:

We are sorry!

I just had a bad user experience. And to further the experience, I have no indication as to when it will be back up or available. I guess I’m supposed to keep checking back until I get what I was looking for… if I remember to.

Here at LINBIT we use DRBD for all of our systems. This ensures that they are always on and always available for the end users and our customers. If for some reason you landed on this site and aren’t familiar with DRBD, DRBD is an open source project developed by us, LINBIT. In its simplest form you can think of it as network raid 1, however instead of having  independent disks, you have two (or more if you’re using DRBD9) independent systems. You essentially now need to lose twice the hardware to experience downtime of services.

One commonly ignored or unrealized benefit of using DRBD is that system maintenance and upgrades can be done with minimal to no interruption of services. The length of the interruption is generally tied to the type of deployment – for example if you’re using virtual machines, live migration can be achieved using DRBD resulting in no downtime. If you’re running services on hardware and they need to be stopped and restarted, your downtime will be limited to the failover time.

So how do we do this? Let say you have two servers; Frodo and Sam – Frodo is Primary (running services) and Sam is Secondary. In this example we need to update the BIOS and upgrade the RAM of our servers. Follow these steps

  1. First put the cluster into maintenance mode
  2. Next power off Sam (the secondary server)
    1. We can now install any upgrades or hardware we need to
    2. Power the system up, enter the BIOS and make sure everything is OK
    3. Reboot and update the BIOS
  3. Boot Sam into the OS
    1. At this point you can install any OS updates and reboot again if needed
  4. Once Sam is back up and everything is verified to be in good condition, bring the cluster out of maintenance mode
  5. Now migrate services to Sam – again depending on how things are configured this may or may not cause a few seconds of  unavailability of services
  6. Repeat steps 1-4 for Frodo

There you have it, one of the better kept secret benefits of using DRBD.

X