Posts

mission-critical-10yr-linbit

LINBIT USA Celebrates 10 Year Anniversary

LINBIT US is celebrating a decade of service and growth. 10 years ago, we started our journey with you from a newly established office in the pacific northwest. In that time, we have moved into new offices, grown our team 4 times in size, built some really great software, and most importantly, met, collaborated with, and served some of the most sophisticated customers along the way. Here’s a snapshot of some of the major milestones told in the present tense.

2010: Our bread and butter has always been High Availability. LINBIT HA software, DRBD, is now in the Linux mainline kernel since 2010, as of release 2.6.33. This promises to be a standout event that makes enterprise-grade HA a standard capability within Linux and puts the open source community on par with the best of proprietary systems out there.

2015: Fast forward to 2015. LINBIT is a company that is actually being talked about as the best solution for huge enterprises! Hundreds of thousands of servers depend on the replication that DRBD provides. All our customers are doing really cool work. And some of them are very well known, such as Cisco and Google. We are forming strong partnerships across North and South America– think RedHat and Suse.

New Horizon: Disaster Recovery

2016: Not only is the LINBIT HA product a success, but our new product focused on disaster recovery, DRBD Proxy, is  proving to be incredibly useful to companies who need to replicate data across distances. LINBIT is having wonderful success in providing clients peace of mind in case a disaster strikes, or perhaps a clumsy admin pulls on some cables they weren’t supposed to be pulling on! Oh, and we can’t forget our fun videos that go along with these products: LINBIT DR, LINBIT HA, and LINBIT SDS.

More in 2016: The official release of DRBD9 to the public. A huge move for enterprises looking to have multiple replicas of their data (up to 32!). Now, companies can implement software-defined storage (SDS) for creating, managing and running a cloud storage environment.

New Kid on the Block: LINSTOR

2018: Now that SDS is a feature, many clients are looking for it. LINBIT is making it even easier, and plausible, with the release of LINSTOR. With this, everything is automated. Deploying a DRBD volume has never been easier.

2018: At this point we would be remiss if we didn’t mention that LINSTOR has Flex Volume & External Provisioner drivers for Kubernetes. We now provide persistent storage to high performance containerized applications! Here is a LINSTOR demo, showing you just how quick and easy it is to deploy a DRBD cluster with 20 resources.

Now: A new guide describes  DRBD for the Microsoft Azure cloud service. We have partners and resellers who have end clients running Windows servers that need HA. One of our engineers even created a video of an NFS failover in Azure!

What else? There is almost too much to say about the past 10 years and the amount of growth and change is astonishing. However, at our core, we are the same. We believe in open source. In building software that turns the difficult into fast, robust, and easy. In our clients. In our company.

“We are grateful”

During a conversation at Red Hat Summit this year, LINBIT COO Brian Hellman was asked how long he had been at LINBIT.  “I replied ‘10 years in September.’ The gentleman was surprised; ‘That’s a long time, especially in the tech industry’.  To which he replied, ‘I love what I do and the people I work with — Not only the members of the LINBIT team, but also our customers, partners, and our extended team.  Without them we wouldn’t be here, they make it all possible and for that we are grateful.”

To whomever is reading this, wherever you are, you were part of it. You ARE part of it! So a big thank you for reading, caring, and hopefully using LINBIT HA, LINBIT DR, or LINBIT SDS. Cheers to another 10 years!

——-

bandwidth-close-up-computer-1148820

Replicating storage volumes on Scaleway ARM with LINSTOR

I’ve been using Scaleway for a while as a platform to spin-up both personal and work machines, mainly because they’re good value and easy to use. Scaleway offers a wide selection of Aarch64 and x86 machines at various price points, however none of these VMs are replicated – not even with RAID at the hardware level – you’re expected to handle that all yourself. Since ARM servers have been making headlines for several years as a competing architecture to x86 in the data center, I thought it would be interesting to set up replication across two ARM Scaleway VMs with DRBD and LINSTOR.

It’s worth pointing out here that if you’re planning on building a production HA environment on Scaleway, you should also reach out to their support team and have them confirm that your replicated volumes aren’t actually sitting on the same spinning disk in case of drive failure, as advised in their FAQ.

Preparing VMs

Linstor scaleway drbd-arm 6

First, we need a couple of VMs with additional storage volumes to replicate. The ARM64-2GB VM doesn’t allow for mounting additional volumes, so let’s go for the next one up, and add an additional 50GB LSSD volume.

Linstor scaleway drbd-arm 2

I’ve gone with an Ubuntu image, if you selected an RPM-based image, substitute package manager commands accordingly. I want to run the following commands on all VMs (in my case I have two, and will be using the first as both my controller and also a satellite node).

$ sudo apt update && sudo apt upgrade

In this case we’ll be deploying DRBD nodes with LINSTOR. We need DRBD9 to do this, but we can’t build a custom kernel module without first getting some prerequisite files for Scaleway’s custom kernel and preparing for a custom kernel module build. Scaleway provides a recommended script to run – we need to save that script and run it before installing DRBD9. I’ve put it in a file on github to make things simple:

$ sudo apt install -y build-essential libssl-dev
$ wget https://raw.githubusercontent.com/dabukalam/scalewaycustommodule/master/scalewaycustommodule
$ chmod +x scalewaycustommodule && sudo ./scalewaycustommodule

Getting LINSTOR

Once that’s done, we can add the LINBIT community repository and install DRBD, LINSTOR, and LVM:

$ sudo add-apt-repository -y ppa:linbit/linbit-drbd9-stack
$ sudo apt update
$ sudo apt install drbd-dkms linstor-satellite linstor-client lvm2

Now I can start the LINSTOR satellite service with:

$ sudo systemctl enable --now linstor-satellite

And make sure the VMs can see each other by adding the other node to each hosts file:

Linstor scaleway drbd-arm 3

Let’s make sure LVM is running and create a volume group for LINSTOR on our additional volume:

$ systemctl enable --now lvm2-lvmetad.service
$ systemctl enable --now lvm2-lvmetad.socket
$ sudo vgcreate sw_ssd /dev/vdb

That’s it for commands you need to run on both nodes. From now on we’ll be running commands on our favorite VM. LINSTOR has four node types – Controller, Auxiliary, Combined, and Satellite. Since I only have two nodes, one will be Combined, and one will be a Satellite. Combined here means that the node is both a Controller and a Satellite.

Adding nodes to the LINSTOR cluster

So on our favorite VM, which we’re going to use as the combined node, we add the local host to the LINSTOR cluster as a combined node, and the other as a satellite:

$ sudo apt install -y linstor-controller
$ sudo systemctl enable --now linstor-controller
$ linstor node create --node-type Combined drbd-arm 10.10.43.13
$ linstor node create --node-type Satellite drbd-arm-2 10.10.25.5
$ linstor node list

It’s worth noting here that you can run commands to manage LINSTOR on any node, just make sure you have the controller node exported as a variable

drbd-arm-2:~$ export LS_CONTROLLERS=drbd-arm

You should now have something that looks like this:

Linstor scaleway drbd-arm 4

Now we have our LINSTOR cluster setup, we can create a storage-pool across the nodes with the same name ‘swpool’, referencing the node name, specifying we want lvm, and the volume group name:

$ linstor storage-pool create drbd-arm swpool lvm sw_ssd
$ linstor storage-pool create drbd-arm-2 swpool lvm sw_ssd

We can then define new resource and volume types, and use them to create the resource. You can perform a whole range of operations at this point including manual node placement and specifying storage pools. Since we only have one storage pool, LINSTOR will automatically select that for us. I only have two nodes so I’ll just autoplace my storage cluster across two.

$ linstor resource-definition create backups
$ linstor volume-definition create backups 40G
$ linstor resource create backups --auto-place 2

LINSTOR will now handle all the resource creation automagically across all our nodes, including dealing with LVM and DRBD. If all succeeds, you should now be able to see your resources. They’ll be inconsistent while DRBD syncs them up. You can also now see the DRBD resources by running drbdmon. Once it’s finished syncing you’ll see a list of your replicated nodes as below (only drbd-arm-2 in my case):

You can now mount the drive on any of the nodes and write to your new replicated storage cluster.

$ linstor resource list-volumes

Linstor scaleway drbd-arm 7

In this case the device name is /dev/drbd1000, so once we create a filesystem on it and mount it I can now write to my new new replicated storage cluster.

$ sudo mkfs /dev/drbd1000
$ sudo mount /dev/drbd1000 /mnt
$ sudo touch /mnt/file

 

 

Danny Abukalam on Linkedin
Danny Abukalam
Danny is a Solutions Architect at LINBIT based in Manchester, UK. He works in conjunction with the sales team to support customers with LINBIT's products and services. Danny has been active in the OpenStack community for a few years, organising events in the UK including the Manchester OpenStack Meetup and OpenStack Days UK. In his free time, Danny likes hunting for extremely hoppy IPAs and skiing, not at the same time.

A Highly Available LINSTOR Controller for Proxmox

For the High Availability setup we describe in this blog post, we assume that you installed LINSTOR and the Proxmox Plugin as described in the Proxmox section of the users guide or our blog post.

The idea is to execute the LINSTOR controller within a VM that is controlled by Proxmox and its HA features, where the storage resides on DRBD, managed by LINSTOR itself.

Preparing the Storage

The first step is to allocate storage for the VM by creating a VM and selecting “Do not use any media” on the “OS” section. The hard disk should reside on DRBD (e.g., “drbdstorage”). Disk space should be at least 2GB, and for RAM we chose 1GB. These are the minimal requirements for the appliance LINBIT provides to its customers (see below). If you set up your own controller VM, or resources are not constrained, increase these minimal values. In the following, we assume that the controller VM was created with ID 100, but it is fine if this VM is created later (after you have already created other VMs).

LINSTOR Controller Appliance

LINBIT provides an appliance for its customers that can be used to populate the created storage. For the appliance to work, we first create a “Serial Port.” First, click on “Hardware” and then on “Add” and finally on “Serial Port.” See image below:

proxmox_serial1_controller_vm

If everything worked as expected, the VM definition should then look like this:

proxmox_add_serial2_controller_vm

The next step is to copy the VM appliance to the created storage. This can be done with qemu-img. Make sure to replace the VM ID with the correct one:

# qemu-img dd -O raw if=/tmp/linbit-linstor-controller-amd64.img \
 of=/dev/drbd/by-res/vm-100-disk-1/0

After that, you can start the VM and connect to it via the Proxmox VNC viewer. The default user name and password are both “linbit”. Note that we kept the defaults for SSH, so you will not be able to log in to the VM via SSH and username/password. If you want to enable that (and/or “root” login), enable these settings in /etc/ssh/sshd_config and restart the ssh service. As this VM is based on “Ubuntu Bionic”, you should change your network settings (e.g., static IP) in /etc/netplan/config.yaml. After that you should be able to ssh to the VM:

proxmox_ssh_controller_vm

Adding the Controller VM to the existing Cluster

In the next step, you add the controller VM to the existing cluster:

# linstor node create --node-type Controller \
 linstor-controller 10.43.7.254

As this special VM will be not be managed by the Proxmox Plugin, make sure all hosts have access to that VM’s storage. In our test cluster, we checked the linstor resource list to confirm where the storage was already deployed and then created further assignments via linstor resource create. In our lab consisting of four nodes, we made all resource assignments diskful, but diskless assignments are fine as well. As a rule of thumb keep the redundancy count at “3” (more usually does not make sense), and assign the rest diskless.

As the storage for this particular VM has to be made available (i.e., drbdadm up), enable the drbd.service on all nodes:

# systemctl enable drbd
# systemctl start drbd

At startup, the linstor-satellite service deletes all of its resource files (.res) and regenerates them. This conflicts with the drbd services that needs these resource files to start the controller VM. Recent LINSTOR releases support a -k/--keep-res parameter where one can specify a regular expression. Resource files matching this expression are not deleted. To make the necessary changes, you need to edit the service file via systemctl (do *not edit the file directly).

# systemctl edit linstor-satellite
## Change the "ExecStart" line to include: --keep-res=vm-100
## "vm-100", if 100 is your VM ID is good enough, remember, it is a regular expression
## You need to include "[Service]" and you need to reset the old value
## The file should look like that:
[Service]
ExecStart=
ExecStart=/usr/share/linstor-server/bin/Satellite --logs=/var/log/linstor-satellite --config-directory=/etc/linstor --keep-res=vm-100

Switching to the New Controller

Now, it is time for the final steps — namely switching from the existing controller to the new one in the VM. Stop the old controller service on the old host, and copy the LINSTOR controller database to the VM:

# systemctl stop linstor-controller
# systemctl disable linstor-controller
# scp /var/lib/linstor/* [email protected]:/var/lib/linstor/

Finally, we can enable the controller in the VM:

# systemctl start linstor-controller # in the VM
# systemctl enable linstor-controller # in the VM

To check if everything worked as expected, you can query the cluster nodes on a host by asking the controller in the VM: linstor --controllers=10.43.7.254 node list. It is perfectly fine that the controller (which is just a controller and not “combined”) is shown as “OFFLINE”. Still, this might change in the future to something more appropriate.

As the last – but crucial – step, you need to add the “controllervm” option to /etc/pve/storage.cfg, and change the controller IP:

drbd: drbdstorage
  content images,rootdir
  redundancy 3
  controller 10.43.7.254
  controllervm 100

By setting the “controllervm” parameter the plugin will ignore (or act accordingly) if there are actions on the controller VM. Basically, this VM should not be managed by the plugin, so the plugin mainly ignores all actions on the given controller VM ID. However, there is one exception. When you delete the VM in the GUI, it is removed from the GUI. We did not find a way to return/kill it in a way that would keep the VM in the GUI. Yet such requests are ignored by the plugin, so the VM will not be deleted from the LINSTOR cluster. Therefore, it is possible to later create a VM with the ID of the old controller. The plugin will just return “OK”, and the old VM with the old data can be used again. To keep it simple, be careful to not delete the controller VM.

Enabling HA for the Controller VM in Proxmox

Currently, we have the controller executed as VM, but we should make sure that one instance of the VM is started at all times. For that we use Proxmox’s HA feature. Click on the VM; then on “More”; and then on “Manage HA.” We set the following parameters for our controller VM:

promox_manage_ha_controller_vm

Final Considerations

As long as there are surviving nodes in your Proxmox cluster, everything should be fine. In case the node hosting the controller VM is shut down or lost, Proxmox HA will make sure the controller is started on another host. The IP of the controller VM should not change. It is up to you as admin to make sure this is the case (e.g., setting a static IP, or always providing the same IP via dhcp on the bridged interface).

One limitation that is not fully handled with this setup is a total cluster outage (e.g., common power supply failure) with a restart of all cluster nodes. Proxmox is unfortunately pretty limited in that regard. You can enable the “HA Feature” for a VM, and you can define “Start and Shutdown Order” constraints. But both are completely separated from each other. Therefore it is difficult to ensure that the controller VM is up and all other VMs are started.

It might be possible to work around that by delaying VM startup in the Proxmox plugin until the controller VM is up (i.e., if the plugin is asked to start the controller VM it does it, otherwise it waits and pings the controller). While this is a nice idea, it would be a huge failure in a serialized, non-concurrent VM start/plugin call event stream where some VM should be started (which then blocks) before the controller VM is scheduled to be started. That would obviously result in a deadlock.

We will discuss options with Proxmox, but we think the presented solution is valuable in typical use cases as is, especially compared to the complexity of a Pacemaker setup. Use cases where one can expect that not the whole cluster goes down at the same time are (will be??) covered. And even if that is the case, only automatic startup of the VMs would not work when the whole cluster is started. In such a scenario, the admin just has to wait until the Proxmox HA service starts the controller VM. After that, all VMs can be started manually/scripted on the command line.

Roland Kammerer
Software Engineer at Linbit
Roland Kammerer studied technical computer science at the Vienna University of Technology and graduated with distinction. Currently, he is a PhD candidate with a research focus on time-triggered realtime-systems and works for LINBIT in the DRBD development team.

 

art-bridge-linstor-proxmox-plugin

How to setup LINSTOR on Proxmox VE

In this technical blog post, we show you how to integrate DRBD volumes in Proxmox VE via a storage plugin developed by LINBIT. The advantages of using DRBD include a configurable number of data replicas (e.g., 3 copies in a 5 node cluster), access to the data on every node and therefore very fast VM live-migrations (usually takes only a few seconds, depending on memory pressure). Download Linstor Proxmox Plugin

Setup

The rest of this post assumes that you have already set up Proxmox VE (the LINBIT example uses 4 nodes), and have created a PVE cluster consisting of all nodes. While this post is not meant to  replace the DRBD User’s Guide, we try to show a complete setup.

The setup consists of two important components:

  1. LINSTOR manages DRBD resource allocation
  2. linstor-proxmox plugin that implements the Proxmox VE storage plugin API and executes LINSTOR commands.

In order for the plugin to work, you must first create a LINSTOR cluster.

LINSTOR Cluster

We have assumed here that you have already set up the LINBIT Proxmox repository as described in the User’s guide. If you have not completed this set up, execute the following commands on all cluster nodes. First, we need the low-level infrastructure (i.e., the DRBD9 kernel module and drbd-utils):

apt install pve-headers
apt install drbd-dkms drbd-utils
rmmod drbd; modprobe drbd
grep -q drbd /etc/modules || echo "drbd" >> /etc/module

The next step is to install LINSTOR:

apt install linstor-controller linstor-satellite linstor-client
systemctl start linstor-satellite
systemctl enable linstor-satellite

Now, decide which of your hosts should be the current controller node and enable the linstor-controller service on that particular node only:

systemctl start linstor-controller

Volume creation

Obviously, DRBD needs storage to create volumes. In this post we assume a setup where all nodes contain an LVM-thinpool called drbdpool. In our sample setup, we created it on the pve volume group, but in your setup, you might have a different storage topology. On the node that runs the controller service, execute the following commands to add your nodes:

linstor node create alpha 10.0.0.1 --node-type Combined
linstor node create bravo 10.0.0.2 --node-type Combined
linstor node create charlie 10.0.0.3 --node-type Combined
linstor node create delta 10.0.0.4 --node-type Combined

“Combined” means that this node is allowed to execute a LINSTOR controller and/or a satellite, but a node does not have to execute both. So it is safe to specify “Combined”; it does not influence the performance or the number of services started.

The next step is to configure a storage pool definition. As described in the User’s guide, most LINSTOR objects consist of a “definition” and then concrete instances of such a definition:

linstor storage-pool-definition create drbdpool

By now it is time to mention that the LINSTOR client provides handy shortcuts for its sub-commands. The previous command could have been written as linstor spd c drbdpool. The next step is to register every node’s storage pool:

for n in alpha bravo charlie delta; do \
linstor storage-pool create $n drbdpool lvmthin pve/drbdpool; \
done

DRBD resource creation

After that we are ready to create our first real DRBD resource:

linstor resource-definition create first
linstor volume-definition create first 10M --storage-pool drbdpool
linstor resource create alpha first
linstor resource create bravo first

Now, check with drbdadm status that  “alpha” and “bravo” contain a replicated DRBD resource called “first”. After that this dummy resource can be deleted on all nodes by deleting its resource definition:

linstor resource-definition delete -q first

LINSTOR Proxmox VE Plugin Setup

As DRBD and LINSTOR are already set up, the only things missing is installing the plugin itself and its configuration.

apt install linstor-proxmox

The plugin is configured via the file /etc/pve/storage.cfg:

drbd: drbdstorage
content images, rootdir
redundancy 2 controller 10.0.0.1

It is not necessary to copy that file to the other nodes, as /etc/pve is already a replicated file system. After the configuration is done, you should restart the following service:

systemctl restart pvedaemon

After this setup is done, you are able to create virtual machines backed by DRBD from the GUI. To do so, select “drbdstorage” as storage in the “Hard Disk” section of the VM. LINSTOR selects the nodes that have the most free storage to create the replicated backing devices.

Distribution

The interested reader can check which ones were selected via LINSTOR resource list. While interesting, it is important to know that the storage can be accessed by all nodes in the cluster via a DRBD feature called “diskless clients”. So let’s assume “alpha” and “bravo” had the most free space and were selected, and the VM was created on node “bravo”. Via the low level tool drbdadm status we now see that the resource is created on two nodes (i.e., “alpha” and “bravo”) and the DRBD resource is in “Primary” role on “bravo”.

Now we want to migrate the VM from “bravo” to node “charlie”. This is again done via a few clicks in the GUI, but the interesting steps happen behind the scene: The storage plugin realizes that it has access to the data on “alpha” and “bravo” (our two replicas) but also needs access on “charlie” to execute the VM. The plugin therefore creates a diskless assignment on “charlie”. When you execute drbdadm status on “charlie”, you see that now three nodes are involved in the overall picture:

• Alpha with storage in Secondary role
• Bravo with storage in Secondary role
• Charlie as a diskless client in Primary role

Diskless clients are created (and deleted) on demand without further user interaction, besides moving around VMs in the GUI. This means that if you now move the VM back to “bravo”, the diskless assignment on “charlie” gets deleted as it is no longer needed.

If you would have moved the VM from “charlie” to “delta”, the diskless assignment for “charlie” would have been deleted, and a new one for “delta” would have been created.

For you it is probably even more interesting that all of this including VM migration happens within seconds without moving the actual replicated storage contents.

Next Steps

So far, we created a replicated and highly-available setup for our VMs, but the LINSTOR controller and especially its database are not highly-available. In a future blog post, we will describe how to make the controller itself highly-available by only using software already included in Proxmox VE (i.e., without introducing complex technologies like Pacemaker). This will be achieved with a dedicated controller VM that will be provided by LINBIT as an appliance.

Roland Kammerer
Software Engineer at Linbit
Roland Kammerer studied technical computer science at the Vienna University of Technology and graduated with distinction. Currently, he is a PhD candidate with a research focus on time-triggered realtime-systems and works for LINBIT in the DRBD development team.
Split brain

Split Brain? Never Again! A New Solution for an Old Problem: DRBD Quorum

While attending OpenStack Summit in Atlanta, I sat in a talk about the difficulties of implementing High Availability (HA) clusters. At one point, the speaker presented a picture of a split-brain, discussed the challenges in resolving them, and implementing STONITH in certain environments. As many of you know, “split-brain” is a condition that can happen when each node in a cluster thinks that it is the only active node. The system as a whole loses grip on its “state”; nodes can go rogue, and data sets can diverge without making it clear which one is primary. Data loss or data corruption can result, but there are ways to make sure this doesn’t happen, so I was interested in probing further.

Fencing is not always the solution

Split brain

The Split brain problem can be solved by DRBD Quorum.

To make it more interesting, it turned out that the speaker’s company uses DRBD and Pacemaker for HA, a setup that is very familiar to us. After the talk, I approached the speaker and recommended that they consider “fencing” as a way to avoid split-brain. Fencing regulates access to a shared resource and can be a good safeguard. As the resource needs separate communication path best practices suggest not using the same one that it is trying to regulate, so it needs a separate communication path. Unfortunately, in his environment, redundant networking was not possible. We needed another method.

Split brain is solved via DRBD Quorum

After talking to the speaker, it was clear to me that a new option for avoiding split brain or diverging data sets was needed since existing solutions may not always be feasible in certain infrastructures. This got me thinking about the various options for avoiding split-brain and how fencing could be implemented by using the built-in communication found in DRBD 9. It turns out that the capability of mirroring more than two nodes, found in DRBD 9 is a viable solution.

That idea sparked the work on the newest feature in DRBD: Quorum.

Shortly thereafter, the LINBIT team developed and integrated a working solution into DRBD. The code was pushed to the LINBIT repository and ready for testing.

Interest was almost immediate!

Later on, I happened to meet a few folks from IBM UK. They were working on IBM MQ Advanced Software, the well-known messaging middleware software that helps integrate applications and data across multiple platforms. They intended to use DRBD for their replication needs and quickly became interested in the idea of using a Quorum mechanism to mitigate split-brain situations.

DRBD Quorum takes new perspective

IBM LogoThe DRBD Quorum feature takes a new approach to avoiding data divergence.  A cluster partition may only modify the replicated data set if the number of nodes that can communicate is greater than half of the overall number of nodes within the defined cluster. By only allowing writes on a node that has access to over half the nodes in a given partition, we avoid creating a diverging data set.

The initial implementation of this feature would cause any node that lost Quorum (and was running the application/data set) to be rebooted.  Removing access to the data set is required to ensure the node stops modifying data. After extensive testing, the IBM team suggested a new idea that instead of rebooting the node, terminate the application. This action would then trigger the already available recovery process, forcing services to migrate to a node with Quorum!

Attractive alternative to fencing

As usual, the devil is in the details. Getting the implementation right with the appropriate resync decisions was not as straightforward as one might think. In addition to our own internal testing, many IBM engineers also tested it as well. We are happy to report that current implementation does exactly what was expected!

Bottom line:

If you need to mirror your data set three times, the new DRBD Quorum feature is an attractive alternative to hardware fencing.

In case you want to learn more about the Quorum implementation in DRBD
please see the DRBD9 user’s guide:
https://docs.linbit.com/docs/users-guide-9.0/#s-feature-quorum
https://docs.linbit.com/docs/users-guide-9.0/#s-configuring-quorum

Image  (Lloyd Fugde – stock.adobe.com)

Philipp Reisner on Linkedin
Philipp Reisner
Philipp Reisner is founder and CEO of LINBIT in Vienna/Austria. His professional career has been dominated by developing DRBD, a storage replication for Linux. Today he leads a company of about 30 employees with locations in Vienna/Austria and Portland/Oregon.

 

 

LINBIT’s DRBD ships with integration to VCS

The LINBIT DRBD software has been updated with an integration for Veritas Infoscale Availability (VIA). VIA, formerly known as Veritas Cluster Server (VCS), is a proprietary cluster manager for building highly available clusters on Linux. Examples of application cluster capabilities are Network File Sharing databases or e-commerce websites. VCS solves the same problem as the Pacemaker Open Source projects.  

Yet, in contrast to Pacemaker, VCS has a long history on the Unix Platform. VCS came to Linux as Linux began to surpass legacy Unix platforms. In addition to its longevity, VCS has a strong and clean user experience. For example, VCS is ahead of the Pacemaker software when it comes to clarity of log files. Notably, the Veritas Cluster Server has slightly fewer features than Pacemaker. (With great power comes complexity!)

Gear-drbd-integration-VCS

The gear runs even smoother. DRBD has an integration for VCS.

VCS integration for DRBD

Since January 2018, DRBD has been shipping with an integration to VCS. Users are now able to use VCS instead of Pacemaker and even control DRBD via VCS. It consists of two agents: DRBDConfigure and DRBDPrimary that enable drbd-8.4 and drbd-9.0 for VCS.

Full documentation can be found here on our website:

https://docs.linbit.com/docs/users-guide-9.0/#s-feature-VCS

and

https://github.com/LINBIT/drbd-utils/tree/master/scripts/VCS

Besides VCS Linbit DRBD supports variety of Linux software so you can keep your system up and running.

Besides VCS Linbit DRBD supports variety of Linux software so you can keep your system up and running.

Pacemaker 1.0.11 and up
Heartbeat 3.0.5 and up
Corosync 2.x and up

 

Reach out to [email protected] for more information.

We are driven by the passion of keeping the digital world running. That’s why hundreds of customers trust in our expertise, services and products. Our OpenSource product DRBD has been installed several million times. Linbit established DRBD® as the industry standard for High-Availability (HA) and data redundancy for mission critical systems. DRBD enables disaster recovery and HA for any application on Linux, including iSCSI, NFS, MySQL, Postgres, Oracle, Virtualization and more.

Philipp Reisner on Linkedin
Philipp Reisner
Philipp Reisner is founder and CEO of LINBIT in Vienna/Austria. His professional career has been dominated by developing DRBD, a storage replication for Linux. Today he leads a company of about 30 employees with locations in Vienna/Austria and Portland/Oregon.

 

Dreaded Day of Downtime

Some say that no one dreads a day of downtime like a storage admin.

I disagree. Sure, the storage admins might be responsible for recovering a whole organization if an outage occurs; and sure, they might be the ones who lose their jobs from an unexpected debacle, but I would speculate that others have more to lose.

First, the company’s reputation takes a big, possibly irreparable hit with both clients and  employees. Damage control usually lasts far longer than the original outage.  Take the United Airlines case from earlier in 2017 when a computer malfunction led to the grounding of all domestic flights. Airports across the country were forced to tweet out messages about the technical issues after receiving an overwhelming number of complaints. Outages such as this one can take months or years to repair the trust with your customers. Depending upon the criticality of the services, a company could go bankrupt. Despite all this, even the company isn’t the biggest loser; it is the end-user: and that is what the rest of this post will focus on.

Let’s say you’re a senior in college. It’s spring term, and graduation is just one week away.  Your school has an online system to submit assignments which are due at midnight, the day before finals week. Like most students at the school, you log into the online assignment submission module, just like you have always done.  Except this time, you get a spinning wheel. Nothing will load. It must be your internet connection. You call a friend to have them submit your papers, but she can’t login either. The culprit: the system is down.

Now, it’s 10:00 PM and you need to submit your math assignment before midnight. At 11:00 PM you start to panic. You can’t log-in and neither can your classmates.  Everyone is scrambling. You send a hastily written email to your professor explaining the issue. She is unforgiving because you shouldn’t have procrastinated in the first place. At 1:00 AM, you refresh the system and everything is working (slowly), but the deadlines have passed. The system won’t let you submit anything. Your heart sinks as you realize that without that project, you will fail your math class and not be able to graduate.

This system outage caused heartache, stress and uncertainty for the students and teachers along with a whole lot of pain for the administrators.  The kicker is that the downtime happened when traffic was anticipated to be the highest! Of course, the servers are going to be overloaded during the last week of Spring term. Yet, notoriously, the University will send an email stating that it experienced higher than expected loads; and that ultimately, they weren’t prepared for it.

During this time, traffic was 15 times its normal usage, and the Hypervisor hosting the NFS server and the file sharing system was flooded with requests.  It blew a fan and eventually overheated. Sure, the data was still safe inside the SAN on the backend.  However, none of that mattered when the students couldn’t access the data until the admin rebuilt the Hypervisor. By the time the server was back up and running, the damage was done.

High Availability isn’t a simple concept but it is critical for your organization, your credibility, and even more importantly, for your end-users or customers. In today’s world, the bar for “uptime” is monstrously high therefore downtime is simply unacceptable.

If you’re a student, an admin or a simple system user- I have a question for you (and don’t just think about yourself, think about your boss, colleagues, and clients):

What would your day look like if your services went unresponsive right… NOW?!
Learn more about the costs and drivers of data loss, and how to avoid it, by reading the paper from OrionX Research.

 

Greg Eckert on Linkedin
Greg Eckert
In his role as the Director of Business Development for LINBIT America and Australia, Greg is responsible for building international relations, both in terms of technology and business collaboration. Since 2013, Greg has connected potential technology partners, collaborated with businesses in new territories, and explored opportunities for new joint ventures.

The Top Issues and Topics for HA-DR in 2018

2017 is coming to a close and it is a good time to look back and then look forward. Thank you to our customers, partners, and the broader open source community for your participation, 2017 was a year of many accomplishments for LINBIT. We celebrated over 1.6 million downloads of DRBD, expanded into China, and released 4 new technical guides: HA NFS on RHEL 7, HA iSCSI on RHEL 7, HA & DR for ActiveMQ, and DRBD with Encryption. Read more

Don’t Settle for Downtime

Innovative Data Storage Can Save Cash, Headaches, and Your Data

Storage Downtime is Unacceptable

When the network goes down, everyone is mildly annoyed, but when the storage goes down,  “Everyone loses their mind, ” as the Joker would say.  And for good reason. No one likes losing payroll data, shipments, customer information, financial transactions, or CRM information… And they certainly don’t like waiting while you roll back to your latest backup. Internally and externally, data-loss and downtime wastes valuable resources and it hurts company reputation. Downtime is becoming less acceptable every day, and data-loss, even more so. Stable, safe, and secure storage should be a priority for those responsible for protecting their business (just ask Equifax).

Traditional Solutions

Due to the increasing need for high availability (HA) and disaster recovery (DR), proprietary storage companies like NetApp and Dell EMC have provided SAN and NAS technologies to protect your organization’s most important data. These hardware appliances, many times, have no single point of failure, synchronous data replication and even a nice GUI so that users can point-and-click their way around. The downside? These storage appliances aren’t scalable and they are expensive. Really expensive.

The Obvious (or not so obvious) Alternative

Did you know that resiliency is built into your Linux OS? That’s right, built into the mainline linux kernel is everything you need to replace your shared storage. For over 15 years, LINBIT has been creating the DRBD software, designed to synchronously replicate data between Linux servers seamlessly just like your SAN. It can even trick the application above to believing they are writing to a SAN, when in reality, it is standard X86, ARM, or Power boxes. The full LINBIT HA solution combines the DRBD software with open source fail-over software as well. This combination eliminates the need for proprietary shared storage solutions. So, why aren’t you using it? You probably didn’t know that it existed.

 

For the past 20 years, those with IT know-how, and small budgets found that HA clustering, using commodity off-the-shelf hardware, was an affordable alternative to traditional storage methods. This crowd consisted of the standard Linux hacker rolling out a home-brewed web-server, and the hyperscale players who didn’t want to rely on outside vendors to build their cloud. Being that these hyperscale companies are using the software to create a competative advantage against their competitors they aren’t all-that-eager to share their stories. They have kept the mid-market in the dark.

Almost all of the major players (including Google, Cisco, Deka Bank, HP, Porsche, and the BBC) have realized that using standard hardware instead of proprietary appliances creates a competitive advantage. Namely: inexpensive resilient storage that their competitors are paying an arm and a leg for. Now, the storage industry’s best kept secret is finally out.

It Doesn’t Stop There

LINBIT is pioneering open source SDS. In development for over 7 years, the new solution will create standard High Availability clusters like described above, and also work perfectly for cloud storage. The LINBIT SDS software introduces performance advantages scalability to the  design. LINBIT’s created a sort of “Operating System based,” Open Source, Software Defined Storage technology that is already built into your existing operating system and ready to use with any Linux system.

The Default Replication Option

LINBIT’s DRBD software receives about 10,000 confirmed downloads per month (people who opt-in to show their statistics). LINBIT is far more engineering and development focused than sales focused so if you aren’t solving a real-world problem you have probably never ran into them. LINBIT’s software popularity is user driven, and due to 3 main reasons:

Flexibility: Since the DRBD software replicates data at the block level, it works with any filesystem, VM, or application that writes data to a hard drive. It can replicate multiple resources simultaneously so users don’t have to choose different replication technologies for every application/database running on the server.

Stability: Being accepted into the mainline Linux kernel is a very stringent process. DRBD has been in the kernel since 2009, version 2.6.33

Synchronous: Prior to DRBD’s availability (no pun intended), the only option for synchronous replication was hardware (SAN, NAS devices). The DRBD software can run in synchronous or asynchronous mode, and be used for local replication or Geo Clustering across long distances.

Now that DRBD has tools to provision your storage, scaling out has never been easier. Interested in how this might apply for your projects? Check out some of LINBIT’s  (free) innovative technical documents which describe how to set up a cluster for your specific environment. Have an idea that isn’t covered in the documentation? Reach out to [email protected] and ask if your idea is sane. They’ll consult the LINBIT engineering team, and will point you in the right direction. Most importantly, NEVER settle for unplanned downtime.

Find out more about the costs of downtime in the podcast, The OrionX Download with LINBIT CEO, Brian Hellman.

DRBD and Randtronics DPM

Today we’re happy to announce a new document titled “Block Replication with Filesystem Encryption” which showcases another wonderful use case for DRBD.

Block Replication with Filesystem Encryption

At Hosting Con, back in April of this year, some colleagues of mine ran into some representatives from Randtronics. Randtronics is the company responsible for the DPM (Data Privacy Management) software suite. This software suite provides file encryption, user management, ACLs, and more. I could imagine this software would prove useful to those in fields where data privacy is an absolute must. Fields such as the medical, legal, human resources, or intellectual property, quickly come to mind.

(Graphic is property of Randtronics)

After a brief discussion with us regarding just how versatile DRBD can be it was decided to see if perhaps DRBD could work seamlessly with DPM. Randtronic’s DPM can help protect your data from prying eyes, or those who may wish to steal it, but can it protect your data from system failures? When teamed up with DRBD you can be assured that your data is both secure and available.

I worked briefly with Gary Lansdown of Randtronics to introduce him to asciidoc, but I must give credit to Randtronics for this document.