LINSTOR CSI Plugin for Kubernetes

A few weeks ago, LINBIT publicly released the LINSTOR CSI (Container Storage Interface) plugin. This means LINSTOR now has a standardized way of working with any container orchestration platform that supports CSI. Kubernetes is one of those platforms, so our developers put in the work to make LINSTOR integration with Kubernetes easy, and I’ll show you how!

You’ll need a couple things to get started:

  • Kubernetes Cluster (1.12.x or newer)
  • LINSTOR Cluster

LINSTOR’s CSI plugin requires certain Kubernetes feature gates be enabled on the kube-apiserver and each kubelet.

Enable the CSINodeInfo and CSIDriverRegistry feature gates on the kube-apiserver by adding, --feature-gates=KubeletPluginsWatcher=true,CSINodeInfo=true, to the list of arguments passed to the kube-apiserver system pod in the /etc/kubernetes/manifests/kube-apiserver.yaml manifest. It should look something like this:

# cat /etc/kubernetes/manifests/kube-apiserver.yaml
apiVersion: v1
kind: Pod
metadata:
  annotations:
    scheduler.alpha.kubernetes.io/critical-pod: ""
  creationTimestamp: null
  labels:
    component: kube-apiserver
    tier: control-plane
  name: kube-apiserver
  namespace: kube-system
spec:
  containers:
  - command:
    - kube-apiserver
    ... snip ...
    - --feature-gates=KubeletPluginsWatcher=true,CSINodeInfo=true
    ... snip ...

To enable these feature gates on the Kubelet, you’ll need to add the following argument to the KUBELET_EXTRA_ARGS variable located in the /etc/sysconfig/kubelet: --feature-gates=CSINodeInfo=true,CSIDriverRegistry=true. Your config should look something like this:

# cat /etc/sysconfig/kubelet 
KUBELET_EXTRA_ARGS="--feature-gates=CSINodeInfo=true,CSIDriverRegistry=true"

Once you’ve modified those two configurations, you can prepare your configuration for the CSI plugin’s sidecar containers. curl down the latest version of the plugin definition:

# curl -O \
https://raw.githubusercontent.com/LINBIT/linstor-csi/master/examples/k8s/deploy/linstor-csi.yaml

Set the value: of each instance of LINSTOR-IP in the linstor-csi.yaml to the IP address of your LINSTOR Controller. The placeholder IP in the example yaml is 192.168.100.100, so we can use the following command to update this address (or you can edit it with an editor), simply set CON_IP to your controller’s IP address:

# CON_IP="x.x.x.x"; sed -i.example s/192\.168\.100\.100/$CON_IP/g linstor-csi.yaml

Finally, apply the yaml to the Kubernetes cluster:

# kubectl apply -f linstor-csi.yaml

You should now see the linstor-csi sidecar pods running in the kube-system namespace:

# watch -n1 -d kubectl get pods --namespace=kube-system --output=wide

Once running, you can define storage classes in Kubernetes pointing to our LINSTOR storage pools that we can then provision persistent, and optionally replicated by DRBD, volumes from for our containers.

Here is an example yaml definition that describes a LINSTOR storage pool in my cluster named, thin-lvm:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: linstor-autoplace-1-thin-lvm
provisioner: io.drbd.linstor-csi
parameters:
  autoPlace: "1"
  storagePool: "thin-lvm"

And here is an example yaml definition for a persistent volume claim carved out of the above storage class:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  annotations:
    volume.beta.kubernetes.io/storage-class: linstor-autoplace-1-thin-lvm
  name: linstor-csi-pvc-0
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi

Put it all together and you’ve got yourself an open source, high performance, block device provisioner for your persistent workloads in Kubernetes!

There are many ways to craft your storage class definitions for node selection, storage tiering, diskless attachments, or even off site replicas. We’ll be working on our documentation surrounding new features, so stay tuned, and don’t hesitate to reach out for the most UpToDate information about LINBIT’s software!

Read more: CSI Plugin for LINSTOR Complete.

Matt Kereczman on Linkedin
Matt Kereczman
Matt is a Linux Cluster Engineer at LINBIT with a long history of Linux System Administration and Linux System Engineering. Matt is a cornerstone in LINBIT’s support team, and plays an important role in making LINBIT’s support great. Matt was President of the GNU/Linux Club at Northampton Area Community College prior to graduating with Honors from Pennsylvania College of Technology with a BS in Information Security. Open Source Software and Hardware are at the core of most of Matt’s hobbies.
LINSTOR OpenStack Banner

How to Setup LINSTOR in OpenStack

This post will walk through the installation and setup procedures for deploying LINSTOR for a persistent, replicated, and high-performance source of block storage within DevStack version of OpenStack running on an Ubuntu host. We will refer to this Ubuntu host as the LINSTOR Controller. This setup also requires at least one additional Ubuntu node handling replicated data, and we will refer to this node as the LINSTOR Satellite. You may have more than one satellite nodes for increased redundancy.

Initial Requirement

The LINSTOR driver is a messenger between the underlying DRBD/LINSTOR and OpenStack. Therefore, both DRBD/LINSTOR as well as OpenStack must be pre-installed and configured. Once LINSTOR is installed, each node must be registered with LINSTOR and have a predefined storage pool on a thin LVM volume.

Install DRBD / LINSTOR on OpenStack Cinder node as a LINSTOR Controller node

# First, download and run a python script to enable LINBIT repo
curl -O 'https://my.linbit.com/linbit-manage-node.py'
chmod u+x linbit-manage-node.py
./linbit-manage-node.py

# Install the DRBD, LINSTOR, and LVM packages
sudo apt install -y drbd-dkms lvm2
sudo apt install -y linstor-controller linstor-satellite linstor-client
sudo apt install -y drbdtop

Configure the LINSTOR Controller

# Start both LINSTOR Controller and Satellite Services
systemctl enable linstor-controller.service
systemctl start linstor-controller.service
systemctl enable linstor-satellite.service
systemctl start linstor-satellite.service

# Create backend storage for DRBD/LINSTOR by creating a Volume Group 'drbdpool'
# Specify appropriate volume location (/dev/vdb)
sudo vgcreate drbdpool /dev/vdb

# Create a Logical Volume 'thinpool' within 'drbdpool'
# Specify appropriate thin volume size (64G)
sudo lvcreate -L 64G -T drbdpool/thinpool

Install DRBD / LINSTOR on all other LINSTOR Satellite node(s)

# First obtain and install DRBD / LINSTOR packages through LINBIT
# by running python script
sudo apt install -y drbd-dkms lvm2
sudo apt install -y linstor-satellite
sudo apt install -y drbdtop

Configure the LINSTOR Satellite node(s)

# Start LINSTOR Satellite Service
systemctl enable linstor-satellite.service
systemctl start linstor-satellite.service

# Create backend storage for DRBD/LINSTOR by creating a Volume Group 'drbdpool'
# Specify appropriate volume location (/dev/vdb)
sudo vgcreate drbdpool /dev/vdb

# Create a Logical Volume 'thinpool' within 'drbdpool'
# Specify appropriate thin volume size (64G)
sudo lvcreate -L 64G -T drbdpool/thinpool

Configure LINSTOR cluster (nodes and storage pool definitions) from the Controller node

# Create the controller node as combined controller and satellite node
linstor node create cinder-node-name 192.168.1.100 --node-type Combined

# Create the satellite node(s)
linstor node create another-linstor-node 192.168.1.101
# repeat to add more satellite nodes in the LINSTOR cluster

# Create LINSTOR Storage Pool on each nodes
# For each node, specify node name, its IP address, 
# storage pool name (DfltStorPool) and volume type (lvmthin)

# On the LINSTOR Controller 
linstor storage-pool create lvmthin cinder-node-name DfltStorPool \
    drbdpool/thinpool
# On the LINSTOR Satellite node(s)
linstor storage-pool create lvmthin another-linstor-node DfltStorPool \
    drbdpool/thinpool
# repeat to add a storage pool to each node in the LINSTOR cluster

 

Cinder Driver Installation & Configuration

Download the latest driver (linstordrv.py)

wget https://github.com/LINBIT/openstack-cinder/blob/stein-linstor/cinder/
volume/drivers/linstordrv.py

Install the driver file in the proper destination

/opt/stack/cinder/cinder/volume/drivers/linstordrv.py

Configure OpenStack Cinder by editing /etc/cinder/cinder.conf
to enable LINSTOR driver by adding ‘linstor’ to enabled_backends

[DEFAULT]
...
enabled_backends=lvm, linstor
…

Then, add a LINSTOR section at the bottom of the cinder.conf

[linstor]
volume_backend_name = linstor
volume_driver = cinder.volume.drivers.linstordrv.LinstorDrbdDriver
linstor_default_volume_group_name=drbdpool
linstor_default_uri=linstor://localhost
linstor_default_storage_pool_name=DfltStorPool
linstor_default_resource_size=1
linstor_volume_downsize_factor=4096
linstor_controller_diskless=False
iscsi_helper=tgtadm

Update Python libraries

sudo pip install protobuf --upgrade
sudo pip install eventlet --upgrade

Register LINSTOR with Cinder

cinder type-key linstor
cinder type-key linstor set volume_backend_name=linstor

Lastly, restart Cinder services

sudo systemctl restart [email protected]
sudo systemctl restart [email protected]
sudo systemctl restart [email protected]

 

Verification of proper installation

Check system journal for any driver errors

# Check if there is a recurring error after restart
sudo systemctl -f -u [email protected]* | grep error

Create a test volume with LINSTOR backend

# Create a 1GiB volume through Cinder and verify LINSTOR backing exists
openstack volume create --type linstor --size 1 --availability-zone nova \
    linstor-test-vol
openstack volume list
linstor resource list

Delete the test volume

# Delete the test volume and verify if LINSTOR removed resources correctly
openstack volume delete linstor-test-vol
linstor resource list

 

Final Comments

By now, the LINSTOR driver should have successfully created a Cinder volume and the matching LINSTOR resources on the backend and then removed them from Cinder. From this point on, managing LINSTOR volumes should be a breeze with OpenStack Horizon’s GUI interface.

Management of LINSTOR snapshots and creation of LINSTOR volumes from those snapshots are also possible. Once a LINSTOR volume becomes available, it can then be made accessible within a Nova instance by creating an attachment. Any LINSTOR-backed volume can then provide replicated and persistent storage.

Please direct any questions regarding the specifics about the driver to Woojay Poynter at [email protected]. For any inquiry regarding DRBD and LINSTOR technology please contact our sales team at [email protected].

Feel free to check out this demonstration of LINSTOR volume management in OpenStack:

 

Woojay Poynter
IO Plumber
Woojay is working on data replication and software-defined-storage with LINSTOR, built on DRBD @LINBIT. He has worked on web development, embedded firmwares, professional culinary education, power carving with ice and wood. He is a proud father and likes to play with legos.

Why you should use LINSTOR in OpenStack

With the LINSTOR volume driver for OpenStack, Linux storage created in OpenStack Cinder can be easily provisioned, managed and seamlessly replicated across a large Linux cluster.

LINSTOR is an open-source storage orchestrator designed to deliver easy-to-use software-defined storage in Linux environments. LINSTOR uses LINBIT’s DRBD to replicate block data with minimal overhead and CPU load. Managing a LINSTOR storage cluster is as easy as a few LINSTOR CLI commands or a few lines of Python code with the LINSTOR API.

LINSTOR pairs with Openstack

OpenStack paired with LINSTOR brings even greater power and flexibility by enabling Linux to become your SDS platform. Replicate storage wherever you need it with simple mouse clicks. Provision snapshots. Create new volumes with those snapshots. LINSTOR volumes can then be paired with the right compute nodes just as easily. Together, OpenStack and LINSTOR bring tremendous potential to provide robust infrastructure with ease, all powered by open-source.

Data replicated with LINSTOR can minimize downtime and data loss. Running your cloud on commodity hardware with the native Linux features underneath provides the most flexible, reliable, and cost-effective solution to hosting customized OpenStack deployment anywhere.

In addition to storage management and replication, LINBIT also offers Geo-Clustering solutions that work with LINSTOR to enable long-distance data replication inside private and public cloud environments.

For a quick recap, please check out this video on deploying LINSTOR volumes with OpenStack’s Horizon GUI.

More information about LINBIT’s DRBD and LINSTOR visit:

For LINSTOR OpenStack Drivers
https://github.com/LINBIT/openstack-cinder

For LINSTOR Driver Documentation:
https://docs.linbit.com/docs/users-guide-9.0/#ch-openstack-linstor

For LINBIT’s LINSTOR webpage:
https://www.linbit.com/en/linstor/

 

Woojay Poynter
IO Plumber
Woojay is working on data replication and software-defined-storage with LINSTOR, built on DRBD @LINBIT. He has worked on web development, embedded firmwares, professional culinary education, power carving with ice and wood. He is a proud father and likes to play with legos.

LINSTOR is officially part of OpenStack

It’s Official. LINSTOR Volume Driver is Now a Part of OpenStack Cinder.

It’s Official. LINSTOR volume driver is now part of OpenStack.

With this code merge, LINSTOR volume driver is now officially part of OpenStack and brings a new level of software-defined-storage (SDS) service to Cinder, the OpenStack’s volume service. 

While the next OpenStack release named ‘Stein’ won’t be out until April, the latest LINSTOR driver is already available on our GitHub repo.

Stay tuned for more news and updates from LINBIT.

Plugin for Linstor with OpenNebula

How to Setup LINSTOR with OpenNebula

This post will guide you through the setup of the LINSTOR – OpenNebula Addon. After completing it, you will be able to easily live-migrate virtual machines between OpenNebula nodes, and additionally, have data redundancy.

Setup Linstor with OpenNebula

This post assumes that you already have OpenNebula installed and running on all of your nodes. At first I will give you a quick guide for installing LINSTOR, for a more detailed documentation please read the DRBD User’s Guide. The second part will show you how to add a LINSTOR image and system datastore to OpenNebula.

We will assume the following node setup:

Node name IP Role
alpha 10.0.0.1 Controller/ON front-end
bravo 10.0.0.2 Virtualization host
charlie 10.0.0.3 Virtualization host
delta 10.0.0.4 Virtualization host

Make sure you have configured 2 lvm-thin storage pools named linstorpool/thin_image and linstorpool/thin_system on all of your nodes.

LINSTOR setup

Install LINSTOR packages

The easiest setup is to install the linstor-controller on the same node as the OpenNebula cloud front-end. The linstor-opennebula package contains our OpenNebula driver, and therefore, is essential on the OpenNebula cloud front-end node. On this node install the following packages:

apt install drbd-dkms drbd-utils python-linstor linstor-satellite linstor-client linstor-controller linstor-opennebula

After the installation completes start the linstor-controller and enable the service:

systemctl start linstor-controller
systemctl enable linstor-controller

On all other virtualization nodes you do not need the linstor-controllerlinstor-client or linstor-opennebula package:

apt install drbd-dkms drbd-utils python-linstor linstor-satellite

For all nodes (including the controller) you have to start and enable the linstor-satellite:

systemctl start linstor-satellite
systemctl enable linstor-satellite

Now all LINSTOR-related services should be running.

Adding and configuring LINSTOR nodes

All nodes that should work as virtualization nodes need to be added to LINSTOR, so that storage can be distributed and activated on all nodes:

linstor node create alpha 10.0.0.1 --node-type Combined
linstor node create bravo 10.0.0.2
linstor node create charlie 10.0.0.3
linstor node create delta 10.0.0.4

Now we will configure the system and image lvm-thin pools with LINSTOR:

linstor storage-pool create lvmthin alpha open_system linstorpool/thin_system
linstor storage-pool create lvmthin bravo open_system linstorpool/thin_system
linstor storage-pool create lvmthin charlie open_system linstorpool/thin_system
linstor storage-pool create lvmthin delta open_system linstorpool/thin_system

linstor storage-pool create lvmthin alpha open_image linstorpool/thin_image
linstor storage-pool create lvmthin bravo open_image linstorpool/thin_image
linstor storage-pool create lvmthin charlie open_image linstorpool/thin_image
linstor storage-pool create lvmthin delta open_image linstorpool/thin_image

For testing we can now try to create a dummy test resource:

linstor resource-definition create dummy
linstor volume-definition create dummy 10M
linstor resource create dummy --auto-place 3 -s open_image

If everything went fine with the above commands you should be able to see a resource created on 3 nodes using our default lvm-thin storage pool:

linstor resource list-volumes

Now we can delete the created dummy resource:

linstor resource-definition delete dummy

LINSTOR is now setup and ready to be used by OpenNebula.

OpenNebula LINSTOR datastores

OpenNebula uses different types of datastores: system, image and files.

LINSTOR supports the system and image datastore types.

  • System datastore is used to store a small context image that stores all information needed to run a virtual machine (VM) on a node.
  • Image datastore as it name reveals stores VM images.

OpenNebula doesn’t need to be configured with a LINSTOR system datastore; it will also work with its default system datastore, but using LINSTOR system datastore gives it some data redundancy advantages.

Setup LINSTOR datastore drivers

As LINSTOR is an addon driver for OpenNebula, the LINSTOR OpenNebula driver needs to be added to it, to do so you need to modify the /etc/one/oned.conf and add linstor to the TM_MAD and DATASTORE_MAD sections.

TM_MAD = [
  executable = "one_tm",
  arguments = "-t 15 -d dummy,lvm,shared,fs_lvm,qcow2,ssh,vmfs,ceph,linstor"
]

Note that for the DATASTORE_MAD section the linstor driver has to specified 2 times (image datastore and system datastore).

DATASTORE_MAD = [
    EXECUTABLE = "one_datastore",
    ARGUMENTS  = "-t 15 -d dummy,fs,lvm,ceph,dev,iscsi_libvirt,vcenter,linstor -s shared,ssh,ceph,fs_lvm,qcow2,vcenter,linstor"
]

And finally at the end of the configuration file, add new TM_MAD_CONF and DS_MAD_CONF sections for the linstor driver:

TM_MAD_CONF = [
    name = "linstor", ln_target = "NONE", clone_target = "SELF", shared = "yes", ALLOW_ORPHANS="yes"
]

DS_MAD_CONF = [
    NAME = "linstor", REQUIRED_ATTRS = "BRIDGE_LIST", PERSISTENT_ONLY = "NO", MARKETPLACE_ACTIONS = "export"
]

Now restart the OpenNebula service.

Adding LINSTOR datastore drivers

After we registered the LINSTOR driver with OpenNebula we can add the image and system datastore.

For the system datastore we will create a configuration file and add it with the onedatastore tool. If you want to use more than 2 replicas, just edit the LINSTOR_AUTO_PLACE value.

cat >system_ds.conf <<EOI
NAME = linstor_system_auto_place
TM_MAD = linstor
TYPE = SYSTEM_DS
LINSTOR_AUTO_PLACE = 2
LINSTOR_STORAGE_POOL = "open_system"
BRIDGE_LIST = "alpha bravo charlie delta"
EOI

onedatastore create system_ds.conf

And we do nearly the same for the image datastore:

cat >image_ds.conf <<EOI
NAME = linstor_image_auto_place
DS_MAD = linstor
TM_MAD = linstor
TYPE = IMAGE_DS
DISK_TYPE = BLOCK
LINSTOR_AUTO_PLACE = 2
LINSTOR_STORAGE_POOL = "open_image"
BRIDGE_LIST = "alpha bravo charlie delta"
EOI

onedatastore create image_ds.conf

Now you should see 2 new datastores in the OpenNebula web front-end that are ready to use.

Usage and Notes

The new datastores can be used in the usual OpenNebula datastore selections and should support all OpenNebula features.

The LINSTOR datastores have also some configuration options that are described on the drivers github repository page.

Data distribution

The interested reader can check which ones were selected via LINSTOR resource list.

linstor resource list

While interesting, it is important to know that the storage can be accessed by all nodes in the cluster via a DRBD feature called “diskless clients”. So let’s assume “alpha” and “bravo” had the most free space and were selected, and the VM was created on node “bravo”. Via the low level tool drbdadm status we now see that the resource is created on two nodes (i.e., “alpha” and “bravo”) and the DRBD resource is in “Primary” role on “bravo”.

Now we want to migrate the VM from “bravo” to node “charlie”. This is again done via a few clicks in the GUI, but the interesting steps happen behind the scenes: The storage plugin realizes that it has access to the data on “alpha” and “bravo” (our two replicas) but also needs access on “charlie” to execute the VM. The plugin therefore creates a diskless assignment on “charlie”. When you execute drbdadm status on “charlie”, you see that now three nodes are involved in the overall picture:

  • Alpha with storage in Secondary role
  • Bravo with storage in Secondary role
  • Charlie as a diskless client in Primary role

Diskless clients are created (and deleted) on demand without further user interaction, besides moving around VMs in the GUI. This means that if you now move the VM back to “bravo”, the diskless assignment on “charlie” gets deleted as it is no longer needed.

If you would have moved the VM from “charlie” to “delta”, the diskless assignment for “charlie” would have been deleted, and a new one for “delta” would have been created.

For you it is probably even more interesting that all of this including VM migration happens within seconds without moving the actual replicated storage contents.

Check this for LINSTOR and OpenNebula:

 

 

Rene Peinthor
Software developer
Rene was one of the first developers seeing a DRBD resource deployed by LINSTOR and is software developer at LINBIT since 2017.
While not squashing bugs in LINSTOR, Rene is either climbing or paragliding down a mountain.

CSI Plugin for LINSTOR Complete

This CSI plug-in allows for the use of LINSTOR volumes on Container Orchestrators that implement CSI, such as Kubernetes.

Preliminary work on the CSI plugin for LINSTOR is now complete and is capable of operating as an independent driver. CSI is a project by the Cloud Native Computing Foundation which aims to serve as an industry standard interface for container orchestration platforms. This allows storage vendors to write one storage driver and have it work across multiple platforms with little or no modification.

In practical terms, this means that LINSTOR is primed to work with current and emerging cloud technologies that implement CSI. Currently, work is being done to provide example deployments for Kubernetes, which should allow an easy way for Kubernetes admins to deploy the LINSTOR CSI plug-in.  We expect full support for Kubernetes integration in early 2019.

Get the code on GitHub.

LINBIT DR and LINSTOR demo

Demo of Extending LINSTOR-Managed DRBD Volume to a DR Node

In this video Matt Kereczman from LINBIT combines components of LINBIT SDS and LINBIT to demonstrate extending an existing LINSTOR managed DRBD volume to a disaster recovery node, located in a geographically-separated datacenter via LINSTOR and DRBD proxy.

Watch the video:

He’s already created a LINSTOR cluster on four nodes: linstor-a, linstor-b, linstor-c and linstor-dr.

You can see that linstor-dr is in a different network than our other three nodes. This network exists in the DR DC, which is connected to our local DC via a 40Mb/s WAN link.

He has a single DRBD resource defined, which is currently replicated synchronously between the three peers in our local datacenter. He’s listed out his LINSTOR-managed resources and volumes which is currently mounted on linstor-a:

Before he adds a replica of this volume to the DR node in his DR datacenter, he’ll quickly test the write throughput of his DRBD device, so he has a baseline of how well it should perform.

He uses the dd to test. Read more

penguin-linux-linstor

How to migrate manually created resources to LINSTOR

In many cases, existing DRBD resources that were created manually some time in the past can be migrated, so that LINSTOR can be used to manage those resources afterwards.

While resources managed by LINSTOR can coexist with resources that were created manually, it can often make sense to migrate such existing resources. When resources are migrated to LINSTOR manually, there are some settings that must be carefully adjusted to match the resource’s existing configuration in LINSTOR.

Here is an overview of the migration process. It is, however, important to note that there are countless special cases, such as existing resources backed by different storage pools on different nodes or resources that use external metadata, where additional steps may be required for a successful migration.

This article describes the migration of an existing DRBD resource with a rather typical configuration.

Prerequisites

LINSTOR must be installed, configured and running already.

Sample configuration assumed by this article for the sample commands:

1 existing DRBD resource:

Name legacy

TCP/IP port number 15500

1 Volume

Volume number 10

Minor number 9100 (/dev/drbd9100)

LVM volume group datastore

Logical volume name legacy (/dev/datastore/legacy)

Logical volume size 1,340 MiB

2 Nodes

Node eagle Node ID 4

Node rabbit Node ID 8

Meta data type internal

Peer slots 4

Create a resource definition for the existing resource

This may require renaming the resource in some cases, because LINSTOR’s naming rules are stricter than DRBD’s naming rules. The TCP/IP port number can be assigned automatically by LINSTOR if you don’t mind the change, and if a short disconnect/reconnect cycle during the migration is not going to cause problems. Otherwise, you can specify the port number manually to set it to the value that the resource is currently using.

 

resource-definition create legacy -p 15500

Adjust the DRBD peer slots count for the resource definition

If LINSTOR needs to create additional resources, or has to recreate DRBD meta data, the current peer count used by the resource’s volumes can be important, because it has a direct influence on the net size of the DRBD device that is created.

 

Note: If the peer count of a DRBD volume is unknown, it can be extracted from a meta data dump, which can be displayed using the drbdadm dump-md command (this requires the volume to be detached, or the resource to be stopped, and will normally require execution of the drbdadm apply-al command first).

 

resource-definition set-property legacy PeerSlotsNewResource 4

Create a volume definition

While manual resource creation typically involves specifying the size of the backend storage device, in this case a logical volume, which will result in DRBD using whatever space is left after meta data creation, LINSTOR works with the net size of a DRBD volume.

To figure out the correct size to use for the LINSTOR volume definition, you can query the net size of the DRBD device using the blockdev utility:

blockdev –getsize64 /dev/drbd9100

LINSTOR stores volume sizes in kiB internally, and the value yielded by the blockdev utility is in bytes, so you will have to divide by 1024. There should be no rest (modulo-division by 1024 should yield zero), unless you are dealing with a special case that might require additional steps to migrate successfully.

In the case of the sample configuration assumed by this article, a 1,340 MiB logical volume with DRBD meta data with a peer count of 4 will result in a net size of 1,371,956 kiB. This is the value to use for the volume definition in LINSTOR.

The volume number (10) and the minor number (9100) should also be specified manually, unless you want LINSTOR to allocate new numbers automatically, which will typically require a reconfiguration of the applications or filesystem entries that are using the existing DRBD resource.

volume-definition create legacy -n 10 -m 9100 1371956KiB

If, for some reason, the backing storage volume is significantly larger than what would be required to fit the net size reported by the DRBD volume, then there is an additional property that can be set on the volume definition:

volume-definition set-property legacy 10 AllowLargerVolumeSize true

Setting this property is not normally required. It should be set if the LINSTOR satellite presents an error regarding the backing storage volume’s size being larger than expected.

Set the name of the manually created backing storage volume

LINSTOR normally generates the name for the backing storage volume, but the name of that volume can be overridden by setting a property on the volume definition as long as the backing storage volume name is the same on each node.

 

volume-definition set-property legacy 10 OverrideVlmId legacy

Create resources from the resource definition

Finally, resources must be created from the resource definition in LINSTOR. When creating resources, the resource’s DRBD node id on each node must match the node id used by the manually created resource. You can lookup the node-id in the existing, manually created DRBD resource configuration file.

Before the final step of creating LINSTOR resources from the resource definition can be executed, the existing manually created DRBD resource configuration file must be moved away, otherwise drbdadm will complain about duplicate definitions.

mv /etc/drbd.d/legacy.res /etc/drbd.d/legacy.res.disabled

It is a good idea to stop the LINSTOR satellites before creating the resources, otherwise the DRBD resource will disconnect from other nodes as the first resource is created, and will then reconnect as the other resources (on other nodes) are added.

By creating the resources first and starting the satellites after all resources have been created, the satellites will immediately configure all connections, thereby normally avoiding a disconnect/reconnect cycle.

resource create --node-id 4 -s fatpool eagle legacy
resource create --node-id 8 -s fatpool rabbit legacy

(If the satellites are currently stopped, add the –async parameter to the command line to avoid having the client wait for the creation of each resource, which would not take place if the satellite is offline)

 

After finishing the migration, LINSTOR can manage the DRBD resource just like resources that were originally created by LINSTOR.

 

Robert Altnoeder on GithubRobert Altnoeder on Linkedin
Robert Altnoeder
Robert joined the LINBIT development team in 2013. He had worked with
DRBD at a startup company in the SaaS field before joining LINBIT. His
current primary field of work is the architecture and implementation of
LINSTOR, the cluster management component of LINBIT's SDS software.

 

bandwidth-close-up-computer-1148820

Replicating storage volumes on Scaleway ARM with LINSTOR

I’ve been using Scaleway for a while as a platform to spin-up both personal and work machines, mainly because they’re good value and easy to use. Scaleway offers a wide selection of Aarch64 and x86 machines at various price points, however none of these VMs are replicated – not even with RAID at the hardware level – you’re expected to handle that all yourself. Since ARM servers have been making headlines for several years as a competing architecture to x86 in the data center, I thought it would be interesting to set up replication across two ARM Scaleway VMs with DRBD and LINSTOR.

It’s worth pointing out here that if you’re planning on building a production HA environment on Scaleway, you should also reach out to their support team and have them confirm that your replicated volumes aren’t actually sitting on the same spinning disk in case of drive failure, as advised in their FAQ.

Preparing VMs

Linstor scaleway drbd-arm 6

First, we need a couple of VMs with additional storage volumes to replicate. The ARM64-2GB VM doesn’t allow for mounting additional volumes, so let’s go for the next one up, and add an additional 50GB LSSD volume.

Linstor scaleway drbd-arm 2

I’ve gone with an Ubuntu image, if you selected an RPM-based image, substitute package manager commands accordingly. I want to run the following commands on all VMs (in my case I have two, and will be using the first as both my controller and also a satellite node).

$ sudo apt update && sudo apt upgrade

In this case we’ll be deploying DRBD nodes with LINSTOR. We need DRBD9 to do this, but we can’t build a custom kernel module without first getting some prerequisite files for Scaleway’s custom kernel and preparing for a custom kernel module build. Scaleway provides a recommended script to run – we need to save that script and run it before installing DRBD9. I’ve put it in a file on github to make things simple:

$ sudo apt install -y build-essential libssl-dev
$ wget https://raw.githubusercontent.com/dabukalam/scalewaycustommodule/master/scalewaycustommodule
$ chmod +x scalewaycustommodule && sudo ./scalewaycustommodule

Getting LINSTOR

Once that’s done, we can add the LINBIT community repository and install DRBD, LINSTOR, and LVM:

$ sudo add-apt-repository -y ppa:linbit/linbit-drbd9-stack
$ sudo apt update
$ sudo apt install drbd-dkms linstor-satellite linstor-client lvm2

Now I can start the LINSTOR satellite service with:

$ sudo systemctl enable --now linstor-satellite

And make sure the VMs can see each other by adding the other node to each hosts file:

Linstor scaleway drbd-arm 3

Let’s make sure LVM is running and create a volume group for LINSTOR on our additional volume:

$ systemctl enable --now lvm2-lvmetad.service
$ systemctl enable --now lvm2-lvmetad.socket
$ sudo vgcreate sw_ssd /dev/vdb

That’s it for commands you need to run on both nodes. From now on we’ll be running commands on our favorite VM. LINSTOR has four node types – Controller, Auxiliary, Combined, and Satellite. Since I only have two nodes, one will be Combined, and one will be a Satellite. Combined here means that the node is both a Controller and a Satellite.

Adding nodes to the LINSTOR cluster

So on our favorite VM, which we’re going to use as the combined node, we add the local host to the LINSTOR cluster as a combined node, and the other as a satellite:

$ sudo apt install -y linstor-controller
$ sudo systemctl enable --now linstor-controller
$ linstor node create --node-type Combined drbd-arm 10.10.43.13
$ linstor node create --node-type Satellite drbd-arm-2 10.10.25.5
$ linstor node list

It’s worth noting here that you can run commands to manage LINSTOR on any node, just make sure you have the controller node exported as a variable

drbd-arm-2:~$ export LS_CONTROLLERS=drbd-arm

You should now have something that looks like this:

Linstor scaleway drbd-arm 4

Now we have our LINSTOR cluster setup, we can create a storage-pool across the nodes with the same name ‘swpool’, referencing the node name, specifying we want lvm, and the volume group name:

$ linstor storage-pool create drbd-arm swpool lvm sw_ssd
$ linstor storage-pool create drbd-arm-2 swpool lvm sw_ssd

We can then define new resource and volume types, and use them to create the resource. You can perform a whole range of operations at this point including manual node placement and specifying storage pools. Since we only have one storage pool, LINSTOR will automatically select that for us. I only have two nodes so I’ll just autoplace my storage cluster across two.

$ linstor resource-definition create backups
$ linstor volume-definition create backups 40G
$ linstor resource create backups --auto-place 2

LINSTOR will now handle all the resource creation automagically across all our nodes, including dealing with LVM and DRBD. If all succeeds, you should now be able to see your resources. They’ll be inconsistent while DRBD syncs them up. You can also now see the DRBD resources by running drbdmon. Once it’s finished syncing you’ll see a list of your replicated nodes as below (only drbd-arm-2 in my case):

You can now mount the drive on any of the nodes and write to your new replicated storage cluster.

$ linstor resource list-volumes

Linstor scaleway drbd-arm 7

In this case the device name is /dev/drbd1000, so once we create a filesystem on it and mount it I can now write to my new new replicated storage cluster.

$ sudo mkfs /dev/drbd1000
$ sudo mount /dev/drbd1000 /mnt
$ sudo touch /mnt/file

 

 

Brian Hellman
Since 2008, Brian has been the Chief Operating Officer of LINBIT and the head of the LINBIT USA team, where he and his team have lead continual double digit growth over his tenure. Brian is committed to bringing High Availability, Disaster Recovery and Software-Defined Storage technologies to the Open Source community. Outside of LINBIT, Brian is a dedicated philanthropist through Oregon Freemasonry and The Shriners Hospital for Children.

A Highly Available LINSTOR Controller for Proxmox

THIS INFORMATION IS OUT OF DATE. READ THE UP TO DATE INFORMATION IN THE USER’S GUIDE

For the High Availability setup we describe in this blog post, we assume that you installed LINSTOR and the Proxmox Plugin as described in the Proxmox section of the users guide or our blog post.

The idea is to execute the LINSTOR controller within a VM that is controlled by Proxmox and its HA features, where the storage resides on DRBD, managed by LINSTOR itself.

Preparing the Storage

The first step is to allocate storage for the VM by creating a VM and selecting “Do not use any media” on the “OS” section. The hard disk should reside on DRBD (e.g., “drbdstorage”). Disk space should be at least 2GB, and for RAM we chose 1GB. These are the minimal requirements for the appliance LINBIT provides to its customers (see below). If you set up your own controller VM, or resources are not constrained, increase these minimal values. In the following, we assume that the controller VM was created with ID 100, but it is fine if this VM is created later (after you have already created other VMs).

LINSTOR Controller Appliance

LINBIT provides an appliance for its customers that can be used to populate the created storage. For the appliance to work, we first create a “Serial Port.” First, click on “Hardware” and then on “Add” and finally on “Serial Port.” See image below:

proxmox_serial1_controller_vm

If everything worked as expected, the VM definition should then look like this:

proxmox_add_serial2_controller_vm

The next step is to copy the VM appliance to the created storage. This can be done with qemu-img. Make sure to replace the VM ID with the correct one:

# qemu-img dd -O raw if=/tmp/linbit-linstor-controller-amd64.img \
 of=/dev/drbd/by-res/vm-100-disk-1/0

After that, you can start the VM and connect to it via the Proxmox VNC viewer. The default user name and password are both “linbit”. Note that we kept the defaults for SSH, so you will not be able to log in to the VM via SSH and username/password. If you want to enable that (and/or “root” login), enable these settings in /etc/ssh/sshd_config and restart the ssh service. As this VM is based on “Ubuntu Bionic”, you should change your network settings (e.g., static IP) in /etc/netplan/config.yaml. After that you should be able to ssh to the VM:

proxmox_ssh_controller_vm

Adding the Controller VM to the existing Cluster

In the next step, you add the controller VM to the existing cluster:

# linstor node create --node-type Controller \
 linstor-controller 10.43.7.254

As this special VM will be not be managed by the Proxmox Plugin, make sure all hosts have access to that VM’s storage. In our test cluster, we checked the linstor resource list to confirm where the storage was already deployed and then created further assignments via linstor resource create. In our lab consisting of four nodes, we made all resource assignments diskful, but diskless assignments are fine as well. As a rule of thumb keep the redundancy count at “3” (more usually does not make sense), and assign the rest diskless.

As the storage for this particular VM has to be made available (i.e., drbdadm up), enable the drbd.service on all nodes:

# systemctl enable drbd
# systemctl start drbd

At startup, the `linstor-satellite` service deletes all of its resource files (*.res) and regenerates them. This conflicts with the drbd services that needs these resource files to start the controller VM. It is good enough to first bring up the resources via drbd.service and then start linstor-satellite.service. To make the necessary changes, you need to create a drop-in for the linstor-satellite.service via systemctl (do
not edit the file directly).

# systemctl edit linstor-satellite
[Unit]
After=drbd.service

Switching to the New Controller

Now, it is time for the final steps — namely switching from the existing controller to the new one in the VM. Stop the old controller service on the old host, and copy the LINSTOR controller database to the VM:

# systemctl stop linstor-controller
# systemctl disable linstor-controller
# scp /var/lib/linstor/* [email protected]:/var/lib/linstor/

Finally, we can enable the controller in the VM:

# systemctl start linstor-controller # in the VM
# systemctl enable linstor-controller # in the VM

To check if everything worked as expected, you can query the cluster nodes on a host by asking the controller in the VM: linstor --controllers=10.43.7.254 node list. It is perfectly fine that the controller (which is just a controller and not “combined”) is shown as “OFFLINE”. Still, this might change in the future to something more appropriate.

As the last – but crucial – step, you need to add the “controllervm” option to /etc/pve/storage.cfg, and change the controller IP:

drbd: drbdstorage
  content images,rootdir
  redundancy 3
  controller 10.43.7.254
  controllervm 100

By setting the “controllervm” parameter the plugin will ignore (or act accordingly) if there are actions on the controller VM. Basically, this VM should not be managed by the plugin, so the plugin mainly ignores all actions on the given controller VM ID. However, there is one exception. When you delete the VM in the GUI, it is removed from the GUI. We did not find a way to return/kill it in a way that would keep the VM in the GUI. Yet such requests are ignored by the plugin, so the VM will not be deleted from the LINSTOR cluster. Therefore, it is possible to later create a VM with the ID of the old controller. The plugin will just return “OK”, and the old VM with the old data can be used again. To keep it simple, be careful to not delete the controller VM.

Enabling HA for the Controller VM in Proxmox

Currently, we have the controller executed as VM, but we should make sure that one instance of the VM is started at all times. For that we use Proxmox’s HA feature. Click on the VM; then on “More”; and then on “Manage HA.” We set the following parameters for our controller VM:

promox_manage_ha_controller_vm

Final Considerations

As long as there are surviving nodes in your Proxmox cluster, everything should be fine. In case the node hosting the controller VM is shut down or lost, Proxmox HA will make sure the controller is started on another host. The IP of the controller VM should not change. It is up to you as admin to make sure this is the case (e.g., setting a static IP, or always providing the same IP via dhcp on the bridged interface).

One limitation that is not fully handled with this setup is a total cluster outage (e.g., common power supply failure) with a restart of all cluster nodes. Proxmox is unfortunately pretty limited in that regard. You can enable the “HA Feature” for a VM, and you can define “Start and Shutdown Order” constraints. But both are completely separated from each other. Therefore it is difficult to ensure that the controller VM is up and all other VMs are started.

It might be possible to work around that by delaying VM startup in the Proxmox plugin until the controller VM is up (i.e., if the plugin is asked to start the controller VM it does it, otherwise it waits and pings the controller). While this is a nice idea, it would be a huge failure in a serialized, non-concurrent VM start/plugin call event stream where some VM should be started (which then blocks) before the controller VM is scheduled to be started. That would obviously result in a deadlock.

We will discuss options with Proxmox, but we think the presented solution is valuable in typical use cases as is, especially compared to the complexity of a Pacemaker setup. Use cases where one can expect that not the whole cluster goes down at the same time are (will be??) covered. And even if that is the case, only automatic startup of the VMs would not work when the whole cluster is started. In such a scenario, the admin just has to wait until the Proxmox HA service starts the controller VM. After that, all VMs can be started manually/scripted on the command line.

Roland Kammerer
Software Engineer at Linbit
Roland Kammerer studied technical computer science at the Vienna University of Technology and graduated with distinction. Currently, he is a PhD candidate with a research focus on time-triggered realtime-systems and works for LINBIT in the DRBD development team.