Posts

DRBD/LINSTOR vs Ceph – a technical comparison

INTRODUCTION

The aim of this article is to give you some insight into CEPH, DRBD and LINSTOR by outlining their basic functions. The following points should help you compare these products and to understand which is the right solution for your system. Before we start, you should be aware of the fact that LINSTOR is made for DRBD and that it is highly recommended for you to use LINSTOR if you are also using DRBD.

DRBD

DRBD works by inserting a thin layer in between the file system, the buffer cache, and the disk driver. The DRBD kernel module captures all requests from the file system and splits them down into two paths. So, how does the actual communication occur? How do two separate servers optimize data protection?

DRBD facilitates communication by mirroring two separate servers. One server, although passive, is usually a direct copy of the other. Any data written to the primary server is simultaneously copied to the secondary server through a real-time communication system. The passive server immediately replicates any changes made in the data.

DRBD 8.x works on two nodes at a time. One is given the role of the primary node while the other is given a secondary role. Reads and writes can only occur on the primary node.

THE BENEFITS OF DRBD 9

The features of DRBD 9.x are a vast improvement over the 8.x version. It is now possible to have up to 32 replicas, including the primary node. This gives you the ability to build your cluster setup with what we call diskless nodes, meaning you don’t have to use storage on your primary node. The primary node in diskless mode still has a DRBD block device, but the data is accessed on the secondary nodes over the network.

The secondary nodes must not mount the file system, not even in read-only mode. While it is true to say that the secondary nodes see all updates on the primary node, they can’t expose these updates to the file system, as DRBD is completely file system agnostic.

One write goes to the actual disk and another to the mirrored disks on a peer node. If the first one fails, the file system can be displayed on one of the opposing nodes and the data will be available for use.

DRBD has no precise knowledge of the file system and, as such, it has no way of communicating the changes upstream to the file system driver. The two-at-a-time rule does not actually limit DRBD from operating on more than two nodes.

Moreover, DRBD-9.x supports multiple peer nodes, meaning one peer might be a synchronous mirror in the local data-center while another secondary might be an asynchronous mirror in a remote site.

Again, the passive server only becomes functional when the primary one fails. When such a failure occurs, Pacemaker immediately recognizes the mishap and shifts to the secondary server. This shifting process, nevertheless, is optional – it can be either manual or automatic. For users who prefer manual, one is required to authorize the system to shift to the passive server when the primary one fails.

LINSTOR

In greater IT infrastructures, cluster managing software is state of the art. This is why LINBIT developed LINSTOR, a software on top of DRBD. DRBD itself is a perfect tool to replicate and access your data, especially when it comes to performance. LINSTOR makes configuring DRBD on a system with more than a few nodes an easy task. LINSTOR manages DRBD and gives you the ability to set it up on a large system.

LINSTOR uses a controller service for managing your cluster and a satellite service which runs on every node for deploying DRBD. The controller can be accessed from every node and enables you to monitor and configure your structure quickly. It can be controlled over REST from the outside and provides a very clear CLI. Furthermore, the LINSTOR REST-API gives you the ability to use LINSTOR volumes in Kubernetes, Proxmox VE, OpenNebula and Openstack.

LINSTOR has a feature to maintain the system at work: There is a separation of control plane vs. data plane. If you wanna upgrade or maintain LINSTOR, there is no downtime of the volumes. In comparison with Ceph, DRBD & LINSTOR are easier to troubleshoot, recover, repair, debug, and easier to intervene manually if required, also mainly due to its simplicity. For sys admin the better maintainability and a less complex environment can be crucial. The higher availability also results in a better reliability. For instance DRBD can be started/stopped manually even if LINSTOR is offline, or, for recovery purposes, even without DRBD installed (simply mount backend storage) – compared to that, trying to find any of your data on disks managed by Ceph can be a quite challenge if your Ceph system is down.

In summary, if you’re looking for increased performance, fast configuration, and filesystem-based storage for your applications, use LINSTOR and DRBD. If you’re looking to run LINSTOR with HA, however, you must use a third-party software such as Pacemaker.

 

CEPH

CEPH is an open source software intended to provide highly scalable object, block, and file-based storage in a unified system.

CEPH consists of a RADOS cluster and its interfaces. The RADOS cluster is a system with services for monitoring and storing data across many nodes. CEPH/RADOS is an object storage cluster with no single-point of failure. This is solved by using an algorithm which cuts the data into blocks and spreads them across the RADOS cluster by using self-managing services. The CRUSH algorithm is used to spread the data on upload and to put the blocks together if an object is requested. CEPH is able to use simple data replication as well as erasure coding for those striped blocks.

On top of the RADOS cluster, LIBRADOS is used to upload or request data from the cluster. CEPH uses LIBRADOS for interfaces CEPHFS, RBD and RADOSGW.

CEPHFS gives you the ability to create a filesystem on a host where the data is stored in the CEPH cluster. Additionally, for using CEPHFS, CEPH needs metadata servers which manage the metadata and balance the load for requests among each other.

RBD or RADOS block device is used for creating virtual block devices on hosts with a CEPH cluster, managing and storing the data in the background. Since RBD is built on LIBRADOS, RBD inherits LIBRADOS’s abilities, including read only snapshots and reverts to snapshot. By striping images across the cluster, CEPH improves read access performance for large block device images. The block device can be virtualized, providing block storage to virtual machines in virtualization platforms such as Apache CloudStack, OpenStack, OpenNebula, Ganeti, and Proxmox Virtual Environment.

RADOSGW is the REST-API for communicating with CEPH/RADOS when uploading and requesting data from the cluster.

In general, CEPH is an object storage cluster with the advantage that you do not have to worry about failing nodes or storage drives, because CEPH recognizes failing devices and replicates the data instantly to another disk where it will be accessed. This also leads to a heavy network load if the devices fail.

Striping data comes with a disadvantage in that it is not possible to access the data on a storage drive by mounting it somewhere else or without a working CEPH cluster.

In conclusion, CEPH is the right solution if you are looking for object storage in your infrastructure. Due to its complexity, you have to expect less performance in comparison to DRBD which is only limited by your network speed. 

Daniel Kaltenböck on Email
Daniel Kaltenböck
Software Engineer at LINBIT HA Solutions GmbH
Daniel Kaltenböck studied technical computer science at the Vienna University of Technology. He is a software engineer by heart with a special focus on software defined storage.
LINSTOR OpenStack Banner

How to Setup LINSTOR in OpenStack

This post will walk through the installation and setup procedures for deploying LINSTOR for a persistent, replicated, and high-performance source of block storage within DevStack version of OpenStack running on an Ubuntu host. We will refer to this Ubuntu host as the LINSTOR Controller. This setup also requires at least one additional Ubuntu node handling replicated data, and we will refer to this node as the LINSTOR Satellite. You may have more than one satellite nodes for increased redundancy.

Initial Requirement

The LINSTOR driver is a messenger between the underlying DRBD/LINSTOR and OpenStack. Therefore, both DRBD/LINSTOR as well as OpenStack must be pre-installed and configured. Once LINSTOR is installed, each node must be registered with LINSTOR and have a predefined storage pool on a thin LVM volume.

Install DRBD / LINSTOR on OpenStack Cinder node as a LINSTOR Controller node

# First, download and run a python script to enable LINBIT repo
curl -O 'https://my.linbit.com/linbit-manage-node.py'
chmod u+x linbit-manage-node.py
./linbit-manage-node.py

# Install the DRBD, LINSTOR, and LVM packages
sudo apt install -y drbd-dkms lvm2
sudo apt install -y linstor-controller linstor-satellite linstor-client
sudo apt install -y drbdtop

Configure the LINSTOR Controller

# Start both LINSTOR Controller and Satellite Services
systemctl enable linstor-controller.service
systemctl start linstor-controller.service
systemctl enable linstor-satellite.service
systemctl start linstor-satellite.service

# Create backend storage for DRBD/LINSTOR by creating a Volume Group 'drbdpool'
# Specify appropriate volume location (/dev/vdb)
sudo vgcreate drbdpool /dev/vdb

# Create a Logical Volume 'thinpool' within 'drbdpool'
# Specify appropriate thin volume size (64G)
sudo lvcreate -L 64G -T drbdpool/thinpool

Install DRBD / LINSTOR on all other LINSTOR Satellite node(s)

# First obtain and install DRBD / LINSTOR packages through LINBIT
# by running python script
sudo apt install -y drbd-dkms lvm2
sudo apt install -y linstor-satellite
sudo apt install -y drbdtop

Configure the LINSTOR Satellite node(s)

# Start LINSTOR Satellite Service
systemctl enable linstor-satellite.service
systemctl start linstor-satellite.service

# Create backend storage for DRBD/LINSTOR by creating a Volume Group 'drbdpool'
# Specify appropriate volume location (/dev/vdb)
sudo vgcreate drbdpool /dev/vdb

# Create a Logical Volume 'thinpool' within 'drbdpool'
# Specify appropriate thin volume size (64G)
sudo lvcreate -L 64G -T drbdpool/thinpool

Configure LINSTOR cluster (nodes and storage pool definitions) from the Controller node

# Create the controller node as combined controller and satellite node
linstor node create cinder-node-name 192.168.1.100 --node-type Combined

# Create the satellite node(s)
linstor node create another-linstor-node 192.168.1.101
# repeat to add more satellite nodes in the LINSTOR cluster

# Create LINSTOR Storage Pool on each nodes
# For each node, specify node name, its IP address, 
# storage pool name (DfltStorPool) and volume type (lvmthin)

# On the LINSTOR Controller 
linstor storage-pool create lvmthin cinder-node-name DfltStorPool \
    drbdpool/thinpool
# On the LINSTOR Satellite node(s)
linstor storage-pool create lvmthin another-linstor-node DfltStorPool \
    drbdpool/thinpool
# repeat to add a storage pool to each node in the LINSTOR cluster

 

Cinder Driver Installation & Configuration

Download the latest driver (linstordrv.py)

wget https://github.com/LINBIT/openstack-cinder/blob/stein-linstor/cinder/
volume/drivers/linstordrv.py

Install the driver file in the proper destination

/opt/stack/cinder/cinder/volume/drivers/linstordrv.py

Configure OpenStack Cinder by editing /etc/cinder/cinder.conf
to enable LINSTOR driver by adding ‘linstor’ to enabled_backends

[DEFAULT]
...
enabled_backends=lvm, linstor
…

Then, add a LINSTOR section at the bottom of the cinder.conf

[linstor]
volume_backend_name = linstor
volume_driver = cinder.volume.drivers.linstordrv.LinstorDrbdDriver
linstor_default_volume_group_name=drbdpool
linstor_default_uri=linstor://localhost
linstor_default_storage_pool_name=DfltStorPool
linstor_default_resource_size=1
linstor_volume_downsize_factor=4096
linstor_controller_diskless=False
iscsi_helper=tgtadm

Update Python libraries

sudo pip install protobuf --upgrade
sudo pip install eventlet --upgrade

Register LINSTOR with Cinder

cinder type-key linstor
cinder type-key linstor set volume_backend_name=linstor

Lastly, restart Cinder services

sudo systemctl restart [email protected]
sudo systemctl restart [email protected]
sudo systemctl restart [email protected]

 

Verification of proper installation

Check system journal for any driver errors

# Check if there is a recurring error after restart
sudo systemctl -f -u [email protected]* | grep error

Create a test volume with LINSTOR backend

# Create a 1GiB volume through Cinder and verify LINSTOR backing exists
openstack volume create --type linstor --size 1 --availability-zone nova \
    linstor-test-vol
openstack volume list
linstor resource list

Delete the test volume

# Delete the test volume and verify if LINSTOR removed resources correctly
openstack volume delete linstor-test-vol
linstor resource list

 

Final Comments

By now, the LINSTOR driver should have successfully created a Cinder volume and the matching LINSTOR resources on the backend and then removed them from Cinder. From this point on, managing LINSTOR volumes should be a breeze with OpenStack Horizon’s GUI interface.

Management of LINSTOR snapshots and creation of LINSTOR volumes from those snapshots are also possible. Once a LINSTOR volume becomes available, it can then be made accessible within a Nova instance by creating an attachment. Any LINSTOR-backed volume can then provide replicated and persistent storage.

Please direct any questions regarding the specifics about the driver to Woojay Poynter at [email protected]. For any inquiry regarding DRBD and LINSTOR technology please contact our sales team at [email protected].

Feel free to check out this demonstration of LINSTOR volume management in OpenStack:

 

Woojay Poynter
IO Plumber
Woojay is working on data replication and software-defined-storage with LINSTOR, built on DRBD @LINBIT. He has worked on web development, embedded firmwares, professional culinary education, power carving with ice and wood. He is a proud father and likes to play with legos.