Posts

Kubernetes Operator: LINSTOR’s Little Helper

Before we describe what our LINSTOR Operator does, it is a good idea to discuss what a Kubernetes Operator actually is. If you are already familiar with Kubernetes Operators, feel free to skip the introduction.

Introduction

CoreOS describes Operators like this:

An Operator is a method of packaging, deploying and managing a Kubernetes application. A Kubernetes application is an application that is both deployed on Kubernetes and managed using the Kubernetes APIs and  kubectl tooling.

That is quite a lot to grasp if you are new to the concept of Operators. What users want to do is their work, without having to worry about setting up infrastructure. Still, there is more to a software lifecycle than just firing up the application in a cluster once. This is where an Operator comes into play. In my opinion, a good analogy is to think of a Kubernetes Operator as an actual human operator. So what would the responsibilities of such a human operator be?

A human operator would be an expert in the business logic of the software that she runs and the dependencies that need to be fulfilled to run the software. Additionally, the operator would be responsible for configuring the software: to scale it, to upgrade it, to make backups, and so on. This is also the responsibility of a Kubernetes Operator implemented in software. It is a software component that is built by experts for a particular containerized software. This Operator is executed by an administrator who isn’t necessarily an expert for that particular software.

A Kubernetes Operator is executed in the Kubernetes cluster itself. In contrast to shell scripts or Ansbile playbooks, which are pretty generic, the Operator framework has one specific purpose, as well as access to Kubernetes cluster information. Additionally, they are managed by standard Kubernetes tools and not some external tools for configuration management. An Operator has a managed lifecycle on its own and is managed by the lifecycle manager.

LINSTOR Operator

CoreOS FAQ contains the following sentence:

Experience has shown that the creation of an Operator typically starts by automating an application’s installation and self-service provisioning capabilities, and then evolves to take on more complex automation.

This pretty much sums up the current state of the linstor-operator. The project is still very young and the current focus was on automating the setup of the LINSTOR cluster. If you are familiar with LINSTOR, you know that there is a central component named the LINSTOR controller, and workers — named LINSTOR satellites — that actually create LVM volumes and configure DRBD to provide data redundancy.

Currently, the Operator can add new nodes/kubelets to the LINSTOR cluster by registering them with their name and network interface configuration by the LINSTOR controller. A second important task is that a LINSTOR satellite can actually provide storage to the cluster. Therefore, the linstor-operator can also register one or multiple storage pools. Metrics of the LINSTOR satellite set’s storage pools are exposed by the operator. Therefore the system administrator saves a lot of time, because the LINSTOR Kubernetes Operator automates a lot of standard tasks, which were previously performed manually.

Future work

There is a lot of possible future work that can be done. Some tasks are obvious, some will be driven by actual input of our users. For example we think that configuring storage is one of the pain points for our uses. Therefore, having a sidecar container that can discover and prepare storage pools to be consumed by LINSTOR might be a good idea. In a dynamic environment such as Kubernetes it might be worthwhile to handle nodes failures in clever ways. We also have container images that can inject the DRBD kernel module into the running host kernel, which could help users with getting started with DRBD. High Availability is always an important topic and related to that using etcd as database backend. Further, we want to tackle one of the core tasks of Operators, which is managing upgrades from one LINSTOR controller version to the next.

Thanks to Hayley for reviewing this blog post, she is too busy doing the actual work. This is just a subset of the capabilities we plan for the linstor-operator. Stay tuned for more information!

 

Roland Kammerer
Software Engineer at Linbit
Roland Kammerer studied technical computer science at the Vienna University of Technology and graduated with distinction. Currently, he is a PhD candidate with a research focus on time-triggered realtime-systems and works for LINBIT in the DRBD development team.

DRBD/LINSTOR vs Ceph – a technical comparison

INTRODUCTION

The aim of this article is to give you some insight into CEPH, DRBD and LINSTOR by outlining their basic functions. The following points should help you compare these products and to understand which is the right solution for your system. Before we start, you should be aware of the fact that LINSTOR is made for DRBD and that it is highly recommended for you to use LINSTOR if you are also using DRBD.

DRBD

DRBD works by inserting a thin layer in between the file system, the buffer cache, and the disk driver. The DRBD kernel module captures all requests from the file system and splits them down into two paths. So, how does the actual communication occur? How do two separate servers optimize data protection?

DRBD facilitates communication by mirroring two separate servers. One server, although passive, is usually a direct copy of the other. Any data written to the primary server is simultaneously copied to the secondary server through a real-time communication system. The passive server immediately replicates any changes made in the data.

DRBD 8.x works on two nodes at a time. One is given the role of the primary node while the other is given a secondary role. Reads and writes can only occur on the primary node.

THE BENEFITS OF DRBD 9

The features of DRBD 9.x are a vast improvement over the 8.x version. It is now possible to have up to 32 replicas, including the primary node. This gives you the ability to build your cluster setup with what we call diskless nodes, meaning you don’t have to use storage on your primary node. The primary node in diskless mode still has a DRBD block device, but the data is accessed on the secondary nodes over the network.

The secondary nodes must not mount the file system, not even in read-only mode. While it is true to say that the secondary nodes see all updates on the primary node, they can’t expose these updates to the file system, as DRBD is completely file system agnostic.

One write goes to the actual disk and another to the mirrored disks on a peer node. If the first one fails, the file system can be displayed on one of the opposing nodes and the data will be available for use.

DRBD has no precise knowledge of the file system and, as such, it has no way of communicating the changes upstream to the file system driver. The two-at-a-time rule does not actually limit DRBD from operating on more than two nodes.

Moreover, DRBD-9.x supports multiple peer nodes, meaning one peer might be a synchronous mirror in the local data-center while another secondary might be an asynchronous mirror in a remote site.

Again, the passive server only becomes functional when the primary one fails. When such a failure occurs, Pacemaker immediately recognizes the mishap and shifts to the secondary server. This shifting process, nevertheless, is optional – it can be either manual or automatic. For users who prefer manual, one is required to authorize the system to shift to the passive server when the primary one fails.

LINSTOR

In greater IT infrastructures, cluster managing software is state of the art. This is why LINBIT developed LINSTOR, a software on top of DRBD. DRBD itself is a perfect tool to replicate and access your data, especially when it comes to performance. LINSTOR makes configuring DRBD on a system with more than a few nodes an easy task. LINSTOR manages DRBD and gives you the ability to set it up on a large system.

LINSTOR uses a controller service for managing your cluster and a satellite service which runs on every node for deploying DRBD. The controller can be accessed from every node and enables you to monitor and configure your structure quickly. It can be controlled over REST from the outside and provides a very clear CLI. Furthermore, the LINSTOR REST-API gives you the ability to use LINSTOR volumes in Kubernetes, Proxmox VE, OpenNebula and Openstack.

LINSTOR has a feature to maintain the system at work: There is a separation of control plane vs. data plane. If you wanna upgrade or maintain LINSTOR, there is no downtime of the volumes. In comparison with Ceph, DRBD & LINSTOR are easier to troubleshoot, recover, repair, debug, and easier to intervene manually if required, also mainly due to its simplicity. For sys admin the better maintainability and a less complex environment can be crucial. The higher availability also results in a better reliability. For instance DRBD can be started/stopped manually even if LINSTOR is offline, or, for recovery purposes, even without DRBD installed (simply mount backend storage) – compared to that, trying to find any of your data on disks managed by Ceph can be a quite challenge if your Ceph system is down.

In summary, if you’re looking for increased performance, fast configuration, and filesystem-based storage for your applications, use LINSTOR and DRBD. If you’re looking to run LINSTOR with HA, however, you must use a third-party software such as Pacemaker.

 

CEPH

CEPH is an open source software intended to provide highly scalable object, block, and file-based storage in a unified system.

CEPH consists of a RADOS cluster and its interfaces. The RADOS cluster is a system with services for monitoring and storing data across many nodes. CEPH/RADOS is an object storage cluster with no single-point of failure. This is solved by using an algorithm which cuts the data into blocks and spreads them across the RADOS cluster by using self-managing services. The CRUSH algorithm is used to spread the data on upload and to put the blocks together if an object is requested. CEPH is able to use simple data replication as well as erasure coding for those striped blocks.

On top of the RADOS cluster, LIBRADOS is used to upload or request data from the cluster. CEPH uses LIBRADOS for interfaces CEPHFS, RBD and RADOSGW.

CEPHFS gives you the ability to create a filesystem on a host where the data is stored in the CEPH cluster. Additionally, for using CEPHFS, CEPH needs metadata servers which manage the metadata and balance the load for requests among each other.

RBD or RADOS block device is used for creating virtual block devices on hosts with a CEPH cluster, managing and storing the data in the background. Since RBD is built on LIBRADOS, RBD inherits LIBRADOS’s abilities, including read only snapshots and reverts to snapshot. By striping images across the cluster, CEPH improves read access performance for large block device images. The block device can be virtualized, providing block storage to virtual machines in virtualization platforms such as Apache CloudStack, OpenStack, OpenNebula, Ganeti, and Proxmox Virtual Environment.

RADOSGW is the REST-API for communicating with CEPH/RADOS when uploading and requesting data from the cluster.

In general, CEPH is an object storage cluster with the advantage that you do not have to worry about failing nodes or storage drives, because CEPH recognizes failing devices and replicates the data instantly to another disk where it will be accessed. This also leads to a heavy network load if the devices fail.

Striping data comes with a disadvantage in that it is not possible to access the data on a storage drive by mounting it somewhere else or without a working CEPH cluster.

In conclusion, CEPH is the right solution if you are looking for object storage in your infrastructure. Due to its complexity, you have to expect less performance in comparison to DRBD which is only limited by your network speed. 

Daniel Kaltenböck on Email
Daniel Kaltenböck
Software Engineer at LINBIT HA Solutions GmbH
Daniel Kaltenböck studied technical computer science at the Vienna University of Technology. He is a software engineer by heart with a special focus on software defined storage.
Container-kubernetes-linsto

Containerize LINSTOR

LINBIT and its Software-Defined Storage (SDS) solution LINSTOR has provided integration with Linux containers for quite some time. These range from a Docker volume plugin, to a Flexvolume plugin, and recently, a CSI plugin for Kubernetes. While we always provided excellent integration to the container world, most of our software itself was not available as a container/base image. Containerizing our services is a non-trivial task. As you probably know, the core of the DRBD software consists of a Linux kernel module and user space utilities that interact via netlink with this kernel module. Additionally, our software needs to create LVM devices and DRBD block devices within a container. These tasks are interesting and challenging to put into containers. For this article, we assume 3 nodes, one node that acts as a LINSTOR controller, and two that act as satellites. We tested this with recent Centos7 machines and with a current version of Docker.

Prerequisites

In this article, we assume access to our Docker registry hosted on drbd.io. On all hosts you should run the following commands:

docker login drbd.io
Username: YourUserName
Password: YourPassword
Login Succeeded

Installing the DRBD kernel modules

We need the DRBD kernel module and its dependencies on the LINSTOR satellites (the controller does not need access to DRBD). For that we provide a solution for the most common platforms, namely Centos7/RHEL7 and Ubuntu Bionic.

docker run --privileged -it --rm \
  -v /lib/modules:/lib/modules drbd.io/drbd9:rhel7
DRBD modul sucessfully loaded 

What this does is check which kernel is actually executed on the host, then found it the most appropriate package in the container and installed it. We ship the same, unmodified rpm/deb packages in the container as we provide in our customer repositories. If you are using Ubuntu Bionic, you should use the drbd.io/drbd9:bionic container.

Running a LINSTOR controller

docker run -d --name=linstor-controller \
  -p 3376:3376 -p 3377:3377 drbd.io/linstor-controller

The controller does not have any special requirements, it just needs to be accessible to the client via TCP/IP. Please note that in this configuration the controller’s database is not persisted. One possibility is to bind-mount the directory used for the controller’s database by adding
-v /some/dir:/var/lib/linstor .

Running a LINSTOR satellite

docker run -d --name=linstor-satellite --net=host \
 --privileged drbd.io/linstor-satellite 

The satellite is the component that creates actual block devices. On one hand the backing devices (usually LVM) and the actual DRBD block devices. Therefore this container needs access to/dev, and it needs to share the host networking. Host networking is required for the communication between drbdsetup and the actual kernel module.

Configuring the Cluster

We have to set up LINSTOR as usual, which fortunately, is an easy task and has to be done only once. In the spirit of this blog post, let’s use a containerized LINSTOR client as well. As the client obviously has to talk to the controller, we need to tell the client in the container where to find the controller. This is done by setting the environment variable LS_CONTROLLERS.

docker run -it --rm -e LS_CONTROLLERS=Controller \ 
  drbd.io/linstor-client interactive
  ...
- volume-definition (vd)
LINSTOR ==> node create Satellite1 172.42.42.10
LINSTOR ==> node create Satellite2 172.42.42.20
LINSTOR ==> storage-pool-definition create drbdpool
LINSTOR ==> storage-pool create lvm Satellite1 drbdpool drbdpool
LINSTOR ==> storage-pool create lvm Satellite2 drbdpool drbdpool 

Creating a replicated DRBD resource

So far we loaded the kernel module on the satellites, started the controller and satellite containers and configured the LINSTOR cluster. Now it is time to actually create resources.

docker run -it --rm -e LS_CONTROLLERS=Controller \ 
  drbd.io/linstor-client interactive
  ... 
- volume-definition (vd)
 LINSTOR ==> resource-definition create demo
 LINSTOR ==> volume-definition create demo 1G
 LINSTOR ==> resource create demo --storage-pool drbdpool --auto-place 2 

If you have drbd-utils installed on the host, you can now see the DRBD resource as usual viadrbdsetup status. But we can also use a container to do that. On one of the satellites you can run a throw-away linstor-satellite container which contains drbd-utils:

docker run -it --rm --net=host --privileged \
 --entrypoint=/bin/bash drbd.io/linstor-satellite
$ drbdsetup status
$ lvs

Note that by default you will not see the symbolic links for the backing devices created by LVM/udev in the LINSTOR satellite container. That is expected. In the container you will see something like /dev/drbdpool/demo_00000, while on the host you will only see/dev/dm-X, and  lvs will not show the LVs. If you really want to see the LVs on the host, you could execute  lvscan -a --cache, but there is no actual reason for that. One might also map the lvmetad socket to the container.

Summary

As you can see, LINBIT’s container story is now complete. It is now possible to deploy the whole stack via containers. This ranges from the lowest level of providing the kernel modules to the highest level of LINSTOR SDS including the client, the controller, and satellites.

Roland Kammerer
Software Engineer at Linbit
Roland Kammerer studied technical computer science at the Vienna University of Technology and graduated with distinction. Currently, he is a PhD candidate with a research focus on time-triggered realtime-systems and works for LINBIT in the DRBD development team.

Rest Easy

Once upon a time, in an office far far away, LINSTOR was at v0.2 when I started using our own python API to write an OpenStack volume driver for LINSTOR. This LINSTOR API would allow a python script to provision and manage LINSTOR volumes. Therefore, Linstor.resource_create(rsc_name=”mine”, node_name=”not_yours”) would create a LINSTOR resource called “mine” on a computer “not_yours.”  Similarly, Linstor.node_list() would return the list of storage nodes in the current LINSTOR cluster.

After a healthy amount of coffee and snacks, my driver started making progress creating volumes and snapshots in OpenStack. As the project progressed and I became more comfortable using the API, I wrote a small GUI prototype to mimic the functionality of using LINSTOR volume provisioning in OpenStack. I found a python-based GUI library called REMI which offered rich GUI features within a single python library. The REMI offered fast prototyping with minimum overhead and even cross-platform deployment.

I wrote a small proof-of-concept GUI script called LINSTOR View and it manages LINSTOR volumes with a graphical interface. REMI provides the UI while my script uses the API calls to manage backend storage. The prototype could list, create, and delete LINSTOR volumes.

Fast forward to 2019: LINSTOR v0.9.2 was just released along with DRBD v9.0.17. One of the many new features of LINSTOR is a REST API. Just like any typical REST implementation, a POST request will create a LINSTOR asset while DELETE does the opposite. Similarly, a PUT request will modify an asset and so on. Software-defined-storage (SDS) with LINSTOR gets even easier with this API. I believe this REST API will allow for easier development with LINSTOR and faster integration with other platforms.

The release notes are available here, along with a few other goodies. But without further stealing the thunder from Rene Peinthor and the rest of the Viennese development team, I bid auf Wiedersehen. I look forward to new developments as LINSTOR nears v1.0.0.

For any questions or comments, please feel free to reach me in the comments below or at [email protected].

Woojay Poynter
IO Plumber
Woojay is working on data replication and software-defined-storage with LINSTOR, built on DRBD @LINBIT. He has worked on web development, embedded firmwares, professional culinary education, power carving with ice and wood. He is a proud father and likes to play with legos.

cloudfest rust

Meet us at Cloudfest in Rust, Germany!

Come and meet LINBIT at CloudFest 2019 in Europa-Park, Rust, Germany. (23rd – 29rd of March 2019)

cloudfest 2019 booth linbit

CloudFest has made a name over the last few years as one of the the best cloud-focused industry events in which to network and have a good time. This year more than 7,000 people are attending the event. Attendees will hear from leaders in the business and get the latest industry buzz. 

The speaker line-up includes names like Dr. Ye Huang, Head of Solution Architects at Alibaba, Will Pemble, CEO at Goal Boss, Bhavin Turakhia, CEO at Flock, or Brian Behlendorf, Inventor of the Apache Web server. 

VISIT us at Booth H24!

LINBIT is announcing some exciting news at Cloudfest: NVMe-oF with LINSTOR! Meaning LINSTOR can now be used as a standalone product, independent from DRBD. NVMe-oF supports Infiniband with RDMA and allows ultrafast performance, easily handling workloads for Big Data Analytics or Artificial Intelligence. Come say hello at Cloudfest! Visit us at Booth H24. 

We are looking forward to you!

Booth visitors will be rewarded with a surprise that even your family will love! 🙂 

man-person-jumping-desert

LINSTOR grows beyond DRBD

For quite some time, LINSTOR has been able to use NVMe-oF storage targets via the Swordfish API. This was expressed in LINSTOR as a resource definition that contains a single resource with one backing disk (that is the NVMe-oF target) and one diskless resource (that is the NVMe-oF initiator).

Layers in the storage stack

In the last few months the team has been busy making LINSTOR more generic, adding support for resource templates. A resource template describes a storage stack in terms of layers for specific resources/volumes. Here are some examples of such storage stacks:

    • DRBD on top of logic volumes (LVM)
    • DRBD on top of zVols (ZFS)
    • Swordfish initiator & target on top of logic volumes (LVM)
    • DRBD on top of LUKS on top of logic volumes (LVM)
  • LVM only

The team came up with an elegant approach that introduces these additional resource templates in ways that allow existing LINSTOR configurations to keep their semantics as the default resource templates.

With this decoupling, we no longer need to have DRBD installed on LINSTOR clusters that do not require the replication functions of DRBD.

What does that mean for DRBD?

The interests of LINBIT’s customers vary widely. Some want to use LINSTOR without DRBD – which is now supported. A very prominent example of this is Intel, who uses LINSTOR in its Rack Scale Design effort to connect storage nodes and compute nodes with NVMe-oF. In this example, the storage is disaggregated from the other nodes.

Other customers see converged architectures as a better fit. For converged scenarios, DRBD has many advantages over a pure data access protocol such as NVMe-oF. LINSTOR is built from the ground up to manage DRBD, therefore, the need for DRBD support will remain.

Linux-native NVMe-oF and NVMe/TCP

SNIA’s Swordfish has clear benefits with creating a standard for managing storage targets such as allowing optimized storage target implementations, as well as a hardware-accelerated data-path, non-Linux control path.

Due to the fact that Swordfish is an extension of Redfish, which needs to be implemented in the Baseboard Management Controller (BMC), we have decided to extend LINSTOR’s driver set to configure NVMe-oF target and initiator software. We do this by utilizing existing tools found within the Linux operating system, eliminating the need for a Swordfish software stack.

Summary

LINSTOR now supports configurations without DRBD. It is now a unified storage orchestrator for replicated and non-replicated storage.

Philipp Reisner on Linkedin
Philipp Reisner
Philipp Reisner is founder and CEO of LINBIT in Vienna/Austria. His professional career has been dominated by developing DRBD, a storage replication for Linux. Today he leads a company of about 30 employees with locations in Vienna/Austria and Portland/Oregon.

Nvme-oF-Linstor-speed

Speed Up! NVMe-oF for LINSTOR

What is NVMe?

The storage world has gained a number of new terms in the last few years. Let’s start with NVMe. The abbreviation stands for Non-Volatile Memory express, which isn’t very self-explanatory. It all began a few years back when NAND Flash started to make major inroads into the storage industry, and the new storage medium needed to be accessed through existing interfaces like SATA and Serial attached SCSI (SAS).

Back at that time, FusionIO created a NAND flash-based SSD that was directly plugged into the PCIe slot of a server. This eliminated the bottleneck of the ATA or SCSI command sets and the interfaces coming from a time of rotating storage media.

The FusionIO products shipped with proprietary drivers, and the industry set forth in creating an open standard that suits the performance of NAND flash. One of the organizations where the players of the industry can meet, align and create standards is the Storage Networking Industry Association ( SNIA).

The first NVMe standard was published in 2013, and it describes a PCIe-based interface and command set to access fast storage. This can be thought of as a cleaned up version of the ATA or SCSI commands plus a PCIe interface.

What is NVMe-oF and NVMe/TCP?

Similar to what iSCSI is to SCSI, NVMe-oF or NVMe/TCP are standards that describe how to send the NVMe commands over networks. NVMe-oF requires a RDMA-capable network (like InfiniBand or RoCE), and NVMe/TCP works on every network that can carry IP traffic.

There are two terms of which to be aware: 1) the initiator is where the applications run that want to access the dataset. Linux comes with a built-in initiator, likewise other OSes already have initiators or will have them soon.

And, 2) the target is where the data is stored. Linux comes with a software target built into the kernel. It might not be obvious that any Linux block device can be made available as a NVMe-oF target using the Linux target software. It is not limited to NVMe devices.

What does this have to do with Swordfish?

While the iSCSI or NVMe-oF standards describe how the READ, WRITE and other operations on block data are shipped from the initiator to the target, they do not describe how a target (volume) gets created or configured. For too many years, this was the realm of vendor specific APIs and GUIs.

SNIA’s Swordfish standard describes how to manage storage targets and make it accessible as NVMe-oF targets. It is a REST API with JSON data. As such, it is easy to understand and embrace.

The major drawback of Swordfish is mainly that it is defined as an extension of Redfish. Redfish is a standard to manage servers over the network. It can be thought of as a modernized IPMI. As such, Redfish will usually be implemented on a Baseboard Management Controller (BMC). While Redfish has its advantages over IPMI, it does not provide something completely new.

On the other hand, Swordfish is something that was not there before, but as it is an extension to Redfish, an implementation of it usually means that the BMC of the machine needs to have a Redfish-enabled BMC, which may hinder or slow down the adoption of Swordfish.

LINSTOR

Since version 0.7, LINSTOR is capable of working with storage provided by Swordfish-compliant storage targets, as well as their initiator counterparts.

Summary

LINSTOR has gained the capability of managing storage on Swordfish/NVMe-oF targets besides working with DRBD and direct attached storage on Linux servers.

Philipp Reisner on Linkedin
Philipp Reisner
Philipp Reisner is founder and CEO of LINBIT in Vienna/Austria. His professional career has been dominated by developing DRBD, a storage replication for Linux. Today he leads a company of about 30 employees with locations in Vienna/Austria and Portland/Oregon.

LINSTOR High Level Resource API

High Level Resource API – The simplicity of creating replicated volumes

In this blog post, we present one of our recent extensions to the LINSTOR ecosystem: A high-level, user-friendly Python API that allows simple DRBD resource management via LINSTOR.

Background: So far LINSTOR components communicated by the following means: Via Protocol Buffers, or via the Python API that is used in the linstor command line client. Protocol Buffers are a great way to transport serialized structured data between LINSTOR components, but by themselves they don’t provide the necessary abstraction for developers.

That is not the job of Protocol Buffers. Since the early days we split the command line client into the client logic (parsing configuration files, parsing command line arguments…), and a Python library (python-linstor). This Python library provides all the bits and pieces to interact with LINSTOR. For example it provides a MultiLinstor class that handles TCP/IP communication to the LINSTOR controller. Additionally, it allows all the operations that are possible with LINSTOR (e.g. creating nodes, creating storage pools…). For perfectly valid reasons this API is very low level and pretty close to the actual Protocol Buffer messages sent to the LINSTOR controller.

By developing more and more plugins to integrate LINSTOR into other projects like OpenStack, OpenNebula, Docker Volumes, and many more, we saw that there is need for a higher level abstraction.

Finding the Right Abstraction

The first dimension of abstraction is to abstract from LINSTOR internals. For example it perfectly makes sense that recreating an existing resource is an error on a low level (think of it as EEXIST). On a higher level, depending on the actual object, trying to recreate an object might be perfectly fine and one wants to get the existing object (i.e. idem-potency).

The second dimension of abstraction is from DRBD and LINSTOR as a whole. Developers dealing with storage already have a good knowledge about concepts like nodes, storage pools, resource, volumes, placement policies… This is the part where we can make LINSTOR and DRBD accessible for new developers.

The third goal was to only provide a set of objects that are important in the context of the user/developer. This, for example, means that we can assume that the LINSTOR cluster is already set up, so we do not need to provide a high-level API to add nodes. For the higher-level API we can focus on [LINSTOR] resources. This allows us to satisfy the KISS (keep-it-simple-stupid) principle. A forth goal was to introduce new, higher-level concepts like placement policies. Placement policies/templates are concepts currently developed in core LINSTOR, but we can already provide basics on a higher level.

Demo Time

We start by creating a 10 GB big replicated LINSTOR/DRBD volume in a 3 node cluster. We want the volume to be 2 times redundant. Then we increase the size of the volume to 20 GB.

$ python
>> import linstor

>> foo = linstor.Resource('foo')

>> foo.volumes[0] = linstor.Volume("10 GB")

There are multiple ways to specify the size.

>> foo.placement.redundancy = 2

>> foo.autoplace()

>> foo.volumes[0].size += 10 * (2 ** 30)

This line is enough to resize a replicated volume cluster wide.

We needed 5 lines of code to create a replicated DRBD volume in a cluster! Let that sink in for a moment and compare it to the steps that were necessary without LINSTOR: Creating backing devices on all nodes, writing and synchronizing DRBD res(ource) files, creating meta-data on all nodes, drbdadm up the resource and force one to the Primary role to start the initial sync.

For the next step we assume that the volume is replicated and that we are a storage plugin developer. Our goal is to make sure the volume is accessible on every node because the block device should be used in a VM. So, A) make sure we can access the block device, and B) find out what the name of the block device of the first volume actually is:

>>> foo.activate(socket.gethostname())

>>> print(foo.volumes[0].device_path)

The method activate is one of these methods that shows how we intended abstraction. Note that we autoplaced the resource 2 times in a 3-node cluster. So LINSTOR chose the nodes that fit best. But now we want the resource to be accessible on every node without increasing the redundancy to 3 (because that would need additional storage and 2 times replicated data is good enough).

Diskless clients

Fortunately DRBD has us covered as it has the concept of diskless clients. These nodes provide a local block device as usual, but they read and write data from/to their peers only over the network (i.e. no local storage). Creating this diskless assignment is not necessary if the node was already part of the replication in the first place (then it already has access to the data locally).

This is exactly what activate does: If the node can already access the data – fine, if not, create a diskless assignment. Now assume we are done and we do not need access to the device anymore. We want to do some cleanup because we do not need a diskless assignment:

>>> foo.deactivate(socket.gethostname()) 

The semantic of this method is to remove the assignment if it is diskless (as it does not contribute to actual redundancy), but if it is a node that stores actual data, deactivate does nothing and keeps the data as redundant as it was. This is only a very small subset of the functionality the high-level API provides, there is a lot more to know like creating snapshots, converting diskless assignments to diskful ones and vice versa, or managing DRBD Proxy. For more information check the online documentation.

If you want to go deeper into the LINSTOR universe, please visit our youtube channel.

Roland Kammerer
Software Engineer at Linbit
Roland Kammerer studied technical computer science at the Vienna University of Technology and graduated with distinction. Currently, he is a PhD candidate with a research focus on time-triggered realtime-systems and works for LINBIT in the DRBD development team.

LINSTOR OpenStack Banner

How to Setup LINSTOR in OpenStack

This post will walk through the installation and setup procedures for deploying LINSTOR for a persistent, replicated, and high-performance source of block storage within DevStack version of OpenStack running on an Ubuntu host. We will refer to this Ubuntu host as the LINSTOR Controller. This setup also requires at least one additional Ubuntu node handling replicated data, and we will refer to this node as the LINSTOR Satellite. You may have more than one satellite nodes for increased redundancy.

Initial Requirement

The LINSTOR driver is a messenger between the underlying DRBD/LINSTOR and OpenStack. Therefore, both DRBD/LINSTOR as well as OpenStack must be pre-installed and configured. Once LINSTOR is installed, each node must be registered with LINSTOR and have a predefined storage pool on a thin LVM volume.

Install DRBD / LINSTOR on OpenStack Cinder node as a LINSTOR Controller node

# First, download and run a python script to enable LINBIT repo
curl -O 'https://my.linbit.com/linbit-manage-node.py'
chmod u+x linbit-manage-node.py
./linbit-manage-node.py

# Install the DRBD, LINSTOR, and LVM packages
sudo apt install -y drbd-dkms lvm2
sudo apt install -y linstor-controller linstor-satellite linstor-client
sudo apt install -y drbdtop

Configure the LINSTOR Controller

# Start both LINSTOR Controller and Satellite Services
systemctl enable linstor-controller.service
systemctl start linstor-controller.service
systemctl enable linstor-satellite.service
systemctl start linstor-satellite.service

# Create backend storage for DRBD/LINSTOR by creating a Volume Group 'drbdpool'
# Specify appropriate volume location (/dev/vdb)
sudo vgcreate drbdpool /dev/vdb

# Create a Logical Volume 'thinpool' within 'drbdpool'
# Specify appropriate thin volume size (64G)
sudo lvcreate -L 64G -T drbdpool/thinpool

Install DRBD / LINSTOR on all other LINSTOR Satellite node(s)

# First obtain and install DRBD / LINSTOR packages through LINBIT
# by running python script
sudo apt install -y drbd-dkms lvm2
sudo apt install -y linstor-satellite
sudo apt install -y drbdtop

Configure the LINSTOR Satellite node(s)

# Start LINSTOR Satellite Service
systemctl enable linstor-satellite.service
systemctl start linstor-satellite.service

# Create backend storage for DRBD/LINSTOR by creating a Volume Group 'drbdpool'
# Specify appropriate volume location (/dev/vdb)
sudo vgcreate drbdpool /dev/vdb

# Create a Logical Volume 'thinpool' within 'drbdpool'
# Specify appropriate thin volume size (64G)
sudo lvcreate -L 64G -T drbdpool/thinpool

Configure LINSTOR cluster (nodes and storage pool definitions) from the Controller node

# Create the controller node as combined controller and satellite node
linstor node create cinder-node-name 192.168.1.100 --node-type Combined

# Create the satellite node(s)
linstor node create another-linstor-node 192.168.1.101
# repeat to add more satellite nodes in the LINSTOR cluster

# Create LINSTOR Storage Pool on each nodes
# For each node, specify node name, its IP address, 
# storage pool name (DfltStorPool) and volume type (lvmthin)

# On the LINSTOR Controller 
linstor storage-pool create lvmthin cinder-node-name DfltStorPool \
    drbdpool/thinpool
# On the LINSTOR Satellite node(s)
linstor storage-pool create lvmthin another-linstor-node DfltStorPool \
    drbdpool/thinpool
# repeat to add a storage pool to each node in the LINSTOR cluster

 

Cinder Driver Installation & Configuration

Download the latest driver (linstordrv.py)

wget https://github.com/LINBIT/openstack-cinder/blob/stein-linstor/cinder/
volume/drivers/linstordrv.py

Install the driver file in the proper destination

/opt/stack/cinder/cinder/volume/drivers/linstordrv.py

Configure OpenStack Cinder by editing /etc/cinder/cinder.conf
to enable LINSTOR driver by adding ‘linstor’ to enabled_backends

[DEFAULT]
...
enabled_backends=lvm, linstor
…

Then, add a LINSTOR section at the bottom of the cinder.conf

[linstor]
volume_backend_name = linstor
volume_driver = cinder.volume.drivers.linstordrv.LinstorDrbdDriver
linstor_default_volume_group_name=drbdpool
linstor_default_uri=linstor://localhost
linstor_default_storage_pool_name=DfltStorPool
linstor_default_resource_size=1
linstor_volume_downsize_factor=4096
linstor_controller_diskless=False
iscsi_helper=tgtadm

Update Python libraries

sudo pip install protobuf --upgrade
sudo pip install eventlet --upgrade

Register LINSTOR with Cinder

cinder type-key linstor
cinder type-key linstor set volume_backend_name=linstor

Lastly, restart Cinder services

sudo systemctl restart [email protected]
sudo systemctl restart [email protected]
sudo systemctl restart [email protected]

 

Verification of proper installation

Check system journal for any driver errors

# Check if there is a recurring error after restart
sudo systemctl -f -u [email protected]* | grep error

Create a test volume with LINSTOR backend

# Create a 1GiB volume through Cinder and verify LINSTOR backing exists
openstack volume create --type linstor --size 1 --availability-zone nova \
    linstor-test-vol
openstack volume list
linstor resource list

Delete the test volume

# Delete the test volume and verify if LINSTOR removed resources correctly
openstack volume delete linstor-test-vol
linstor resource list

 

Final Comments

By now, the LINSTOR driver should have successfully created a Cinder volume and the matching LINSTOR resources on the backend and then removed them from Cinder. From this point on, managing LINSTOR volumes should be a breeze with OpenStack Horizon’s GUI interface.

Management of LINSTOR snapshots and creation of LINSTOR volumes from those snapshots are also possible. Once a LINSTOR volume becomes available, it can then be made accessible within a Nova instance by creating an attachment. Any LINSTOR-backed volume can then provide replicated and persistent storage.

Please direct any questions regarding the specifics about the driver to Woojay Poynter at [email protected]. For any inquiry regarding DRBD and LINSTOR technology please contact our sales team at [email protected].

Feel free to check out this demonstration of LINSTOR volume management in OpenStack:

 

Woojay Poynter
IO Plumber
Woojay is working on data replication and software-defined-storage with LINSTOR, built on DRBD @LINBIT. He has worked on web development, embedded firmwares, professional culinary education, power carving with ice and wood. He is a proud father and likes to play with legos.

Why you should use LINSTOR in OpenStack

With the LINSTOR volume driver for OpenStack, Linux storage created in OpenStack Cinder can be easily provisioned, managed and seamlessly replicated across a large Linux cluster.

LINSTOR is an open-source storage orchestrator designed to deliver easy-to-use software-defined storage in Linux environments. LINSTOR uses LINBIT’s DRBD to replicate block data with minimal overhead and CPU load. Managing a LINSTOR storage cluster is as easy as a few LINSTOR CLI commands or a few lines of Python code with the LINSTOR API.

LINSTOR pairs with Openstack

OpenStack paired with LINSTOR brings even greater power and flexibility by enabling Linux to become your SDS platform. Replicate storage wherever you need it with simple mouse clicks. Provision snapshots. Create new volumes with those snapshots. LINSTOR volumes can then be paired with the right compute nodes just as easily. Together, OpenStack and LINSTOR bring tremendous potential to provide robust infrastructure with ease, all powered by open-source.

Data replicated with LINSTOR can minimize downtime and data loss. Running your cloud on commodity hardware with the native Linux features underneath provides the most flexible, reliable, and cost-effective solution to hosting customized OpenStack deployment anywhere.

In addition to storage management and replication, LINBIT also offers Geo-Clustering solutions that work with LINSTOR to enable long-distance data replication inside private and public cloud environments.

For a quick recap, please check out this video on deploying LINSTOR volumes with OpenStack’s Horizon GUI.

More information about LINBIT’s DRBD and LINSTOR visit:

For LINSTOR OpenStack Drivers
https://github.com/LINBIT/openstack-cinder

For LINSTOR Driver Documentation:
https://docs.linbit.com/docs/users-guide-9.0/#ch-openstack-linstor

For LINBIT’s LINSTOR webpage:
https://www.linbit.com/en/linstor/

 

Woojay Poynter
IO Plumber
Woojay is working on data replication and software-defined-storage with LINSTOR, built on DRBD @LINBIT. He has worked on web development, embedded firmwares, professional culinary education, power carving with ice and wood. He is a proud father and likes to play with legos.