Linstor world record 14,8 million

IOPS World Record Broken – LINBIT Tops 14.8 million IOPS

In a performance test LINBIT measured 14.8 million IOPS on a 12 node cluster built from standard off-the-shelf Intel servers. This is the highest storage performance reached by a hyper-converged system on the market, for this hardware basis. Even a small LINBIT storage system can provide millions of IOPS at latencies of a fraction of a millisecond. For real-world applications, these figures correspond to outstanding application performance.

Test setup

LINBIT chose this setup because our competitors have published test results from equivalent systems. So it is easy to compare the strengths of each software offering with fair conditions and the same environment. We worked hard to get the most of the system and made it! Microsoft managed to reach 13.7 million IOPS, and Storpool marginally topped that with 13.8 million IOPS. We reached 14.8 million remote read IOPS – a significant jump of 7.2%! “Those performance numbers mark a milestone in the development of our software. The results prove we speed up High Availability at a large scale”, says CEO Philipp Reisner. The numbers would scale up even further with a larger setup.

Linstor world record 14,8 million


These exciting results are for 3-way synchronous replication using DRBD. The test cluster was provided through the Intel®️ Data Center Builders program. It consists of 12 servers, each running 8 instances of the benchmark, making a total of 96 instances. The setup is hyper-converged, meaning that the same servers are used to run the benchmark and to provide the underlying storage.


Linstor world record 14,8 million server


For some benchmarks, one of the storage replicas is a local replica on the same node as the benchmark workload itself. This is a particularly effective configuration for DRBD.

DRBD provides a standard Linux block device, so it can be used directly from the host, from a container, or from a virtual machine. For these benchmarks, the workload runs in a container, demonstrating the suitability of LINBIT’s SDS solution, which consists of DRBD and LINSTOR, for use with Kubernetes.

IOPS and bandwidth results are the totals from all 96 workload instances. Latency results are averaged.

Let’s look into the details!

Top performance with DRBD

5.0 million synchronously replicated write IOPS

This was achieved with a 4K random write benchmark with an IO depth of 64 for each workload. The setup uses Intel® Optane™ DC Persistent Memory to store the DRBD metadata. The writes are 3-way replicated with one local replica and two remote replicas. This means that the backing storage devices are writing at a total rate of 15 million IOPS.

85μs synchronously replicated write latency

This was achieved with a 4K random write benchmark with serial IO. That is, an IO depth of 1. This means that the writes were persisted to all 3 replicas within an average time of only 85μs. DRBD attained this level of performance both when one of the replicas was local and when all were remote. The setup also uses Intel® Optane™ DC Persistent Memory to store metadata.

14.8 million remote read IOPS

This was achieved with a 4K random read benchmark with an IO depth of 64. This corresponds to 80% of the total theoretical network bandwidth of 75GB/s. This result was reproduced without any usage of persistent memory so that the value can be compared with those from our competitors.

10.6 million IOPS with 70/30 mixed read/write

Representing a more typical real-world scenario, this benchmark consists of 70% reads and 30% writes and used an IO depth of 64. One of the 3 replicas was local.


By loading the video, you agree to YouTube's privacy policy.
Learn more

Load video

Benefits of persistent memory

DRBD is optimized for persistent memory. When the DRBD metadata is stored on an NVDIMM, write performance is improved.

When the metadata is stored on the backing storage SSD with the data, DRBD can process 4.5 million write IOPS. This increases to 5.0 million when the metadata is stored on Intel® Optane™ DC Persistent Memory instead, an improvement of 10%.

Moving the metadata onto persistent memory has a particularly pronounced effect on the write latency. This metric plummets from 113μs to 85μs with this configuration change. That is, the average write is 25% faster.

Detailed results

Below are the full results for DRBD running on the 12 servers with a total of 96 benchmark workloads.

Benchmark name Without local replica With local replica
Random read

(higher is better)

14,800,000 IOPS 22,100,000 IOPS
Random read/write 70/30

(higher is better)

8,610,000 IOPS 10,600,000 IOPS
Random write

(higher is better)

4,370,000 IOPS 5,000,000 IOPS
Sequential read

(higher is better)

64300 MB/s 111000 MB/s
Sequential write

(higher is better)

20700 MB/s 23200 MB/s
Read latency

(lower is better)

129 μs 82 μs
Write latency

(lower is better)

85 μs 84 μs

The IOPS and MB/s values have been rounded down to 3 significant figures.

All volumes are 500GiB in size, giving a total active set of 48,000GiB and consuming a total of 144,000GiB of the underlying storage. The workloads are generated using the fio tool with the following parameters:

Benchmark type Block size IO depth Workload instances Total active IOs
Random 4K 64 96 6144
Sequential 128K 16 96 1536
Latency 4K 1 96 96

Quality controls

In order to ensure that the results are reliable, the following controls were applied:

  • The entire dataset was written after allocating the volumes, but before running the tests. This prevents artificially fast reads of unallocated blocks. When the backing device driver or firmware recognizes that an unallocated block is being read, it may simply return zeros without reading from the physical medium.
  • The benchmark uses direct IO to bypass the operating system cache and the working set was too large to be cached in memory in any case.
  • The tests were each run for 10 minutes. The metrics stabilized within a small proportion of this time.
  • The measurements were provided by the benchmarking tool itself, rather than being taken from a lower level such as the DRBD statistics. This ensures that the performance corresponds to that which a real application would experience.
  • The random pattern used for the benchmark used a random seed to avoid any bias due to the same blocks being chosen by subsequent test runs.

Software stack

The following key software components were used for these benchmarks:

  • Distribution: CentOS 8.0.1905
  • Kernel: Linux 4.18.0-80.11.2.el8_0.x86_64
  • LVM from distribution kernel
  • DRBD 9.0.21-1
  • Docker 19.03.5
  • Fio 3.7


In this text and at LINBIT, in general, we use the expression 2 replicas to indicate that the data is stored on 2 storage devices. For these tests, there are 3 replicas, meaning that the data is stored on 3 storage devices.

In other contexts, the expression 2 replicas might mean one original plus 2 replicas. That would mean that data would be stored on 3 storage devices.

Test infrastructure

These results were obtained on a cluster of 12 servers made available as part of the Intel® Data Center Builders program. Each server was equipped with the following configuration:

  • Processor: 2x Intel® Xeon Platinum 8280L CPU
  • Memory: 384GiB DDR4 DRAM
  • Persistent memory: 4x 512GB Intel® Optane™ DC Persistent Memory
  • Storage: At least 4x Intel® SSD DC P4510 of at least 4TB
  • Network: Intel® Ethernet Network Adapter XXV710 with dual 25GbE ports

The servers were all connected in a simple star topology with a 25Gb switch.


By loading the video, you agree to YouTube's privacy policy.
Learn more

Load video

Speed is of the essence

Storage has often been a bottleneck in modern IT environments. The two requirements speed and high availability have always been in competition. If you aim for maximum speed, the quality of the high availability tends to suffer and vice versa. But with this performance test we demonstrate the best-of-breed open source software-defined storage solution. A replicated storage system that combines high availability and the performance of local NVMe drives is now possible.

This technology enables any public and private cloud builder to deliver high performance for their applications, VMs and containers. If you aim to build a powerful private or public cloud, our solution meets your storage performance needs.

If you want to learn more or have any questions, do contact us at [email protected]


Joel Colledge on Linkedin
Joel Colledge
Joel is a software developer at LINBIT with a background in mathematics. A polyglot programmer, Joel enjoys working with many different languages and technologies. At LINBIT, he has been involved in the development of LINSTOR and DRBD. Originally from England, Joel is now based in Vienna, Austria.


Piraeus datastore - port

LINBIT announces Piraeus Datastore – Software-Defined Storage (SDS) for Kubernetes

Piraeus Datastore offers high performance, highly reliable SDS solution for Persistent Volumes in Kubernetes


San Diego, November 18., 2019 – LINBIT, the inventor of the open-source software DRBD™ and leader in Linux storage software, has announced the project “Piraeus Datastore”. Piraeus Datastore offers a fast, stable way for users to provide Persistent Volumes for their Kubernetes applications. It is developed under the open-source development model and has been made publicly available via pre-built packages and containers in a joint-effort between LINBIT, Daocloud and the open-source community. 


Serving the rapidly growing containerized applications market, Piraeus Datastore fills a significant gap in the market by providing a stable Software-Defined Storage solution for Kubernetes applications requiring high-performance block storage. “When it comes to storage, Piraeus helps clients achieve better system availability than competitors in the space can offer ”, says  Philipp Reisner, CEO of LINBIT. “Most Kubernetes storage newcomers combined the data and control plane to push out a minimum viable product. By separating these components out, we ensure that controller failure doesn’t impact storage system availability.” 


DaoCloud has been a leading CloudNative computing vendor in China since 2014. They have extensive experience of Kubernetes production with Fortune Global 500 customers in manufacturing and finance industries like SAIC, Haier and SPDB bank. They believe that Piraeus fits their clients’ needs for a cloud-native, container attached storage that provides both reliability and performance. Roby Chen, CEO of DaoCloud, says: “It is very exciting that we have the Piraeus project to elevate DRBD technology into the cloud-native arena, where we believe it can play a key role.” 

Kubernetes is generating more and more buzz

Piraeus Datastore provides data persistence for elastic applications, which dynamically create or remove containers depending on the load. A recent survey of 390 IT professionals showed that 51% of participants acknowledge there has been an increase in Kubernetes adoption in the last six months. And 86% of respondents say they have now adopted Kubernetes. The number went up from 57% a half a year ago. Storage remains a requirement for enterprises that are beginning to move their applications over to Kubernetes and the Piraeus datastore solution is perfect for clients who need the combination of reliability and performance.


With Piraeus Datastore, two key Open Source technologies are packaged in a way that is easily accessible and consumable for Kubernetes users. They bring the highest performance by leveraging on the proven DRBD technology and real-world operable clusters by the way LINSTOR™ separates control and data paths.

Piraeus Datastore makes proven technologies cloud-native. DRBD is under development, improvements and optimizations for 19 Years. LINSTOR stands out in the field of SDS systems by having a separate and independent control plane that is independent from the data plane. This big advantage of this separation is it makes upgrades in a running storage cluster feasible, which saves downtime and therefore a lot of money. 

Piraeus Datastore is perfectly suited for databases, AI and analytics workloads, where the throughput and latency of primary storage is required.


LINBIT SDS™ is the industry’s fastest software-defined storage solution for enterprise, cloud, and container environments. LINBIT SDS leverages the DRBD™ and LINSTOR™ technology to provision, replicate, and manage data storage: independent of underlying hardware. The Piraeus containers intended for adoption by the community, while the LINBIT SDS containers are intended for consumption by corporate users. The LINBIT SDS product comes with enterprise support options, while Piraeus is supported by the community.


For a close comparison check out the following graphic:


Piraeus_logoPiraeus Datastore linbit_sdsLINBIT SDS
Container base image Debian_logodebian UBI
Pre-built available Publicly on Dockerhub, Quay For LINBIT customers
Support community only ✅ enterprise, incl 24/7
Runs with OpenShift  without DRBD
OpenShift certified n.a.  
Kernel-module compile from source compile & pre-compiled
Contains DRBD logo, piraeus-operator, linstor-csi and


Licensing Open source software, GPL & Apache
Developed and verified for kubernetes container orchestration and redhat primed openshift
Platforms on the roadmap ibm cloud containerSuse Caas


Learn more:


About Linbit


LINBIT is the force behind DRBD and a leader in open-source Linux block storage software for enterprise and cloud computing. The LINBIT software has helped dozens of global companies such as Volkswagen, Intel, Cisco, Siemens, BBC to provide High Availability (HA), Geo Clustering for Disaster Recovery (DR), and Software-Defined Storage (SDS) for public and private clouds. Based in Vienna LINBIT partners with other companies like Redhat, Intel, IBM or DaoCloud to accelerate Linux storage software. For more information, visit or follow @linbit


Social Media Channels:


LINBIT on Twitter

LINBIT on Linkedin

LINBIT on Youtube

LINBIT on Facebook


About DaoCloud


DaoCloud is a leading CloudNative computing vendor in China since 2014. They have extensive experience of Kubernetes production with Fortune Global 500 customers in manufacturing and finance industries like SAIC, Haier and SPDB bank. 


PR contact:


Sebastian Schinhammer

Marketing Manager


Phone: 0043 1 817 82 92 -64

LINBIT HA-Solutions GmbH

Vivenotgasse 48

1120 Wien


Bank replaces Veritas with DRBD HA

Use Case: Bank replaces Veritas Volume Replicator with LINBIT HA

Just in the last few months, LINBIT closed with the help of a Red Hat a deal with a large, well-known retail and commercial bank with branches across England and Wales. Since 2019, LINBIT has seen a growing interest from banks and the financial sector in our open source products. A bank operating in the UK changed its High Availability solution from Veritas Volume Replicator to LINBIT HA. Here you get all the relevant information about how what and why!


A long time ago, financial institutions ran a high number of internal data-processing services and the IT department created blueprints, or standard architectures, showing how services should be deployed.

At that time, servers were running Sun Solaris or IBM’s AIX operating system. Veritas Cluster Server (VCS) – or sometimes Volume Replicator (VVR) – was the software used to keep these servers up and running.

Fast forward to 2019. Most new services get deployed on Linux, namely Red Hat’s enterprise Linux (RHEL). RHEL was swapped in for the operating system, the cluster stack to form HA-clusters remained unchanged (VCS and VVR).

With Red Hat already having an OS in the stack, they use this opportunity to promote their own answer to HA clustering. Under the name “High Availability Add-On”, Red Hat brings the open source Pacemaker technology to customers, which is their replacement for VCS.

DRBD replaces VVR

In some cases, VCS is deployed with VVR. This is where LINBIT’s HA comes in. It acts as a replacement for VVR, and is perfectly integrated with Pacemaker in technical terms and in terms of support. Red Hat and LINBIT combined their support forces via TSANet, which gives customers a seamless support experience in a multi-vendor relationship.

LINBIT also knows all there is to know about Pacemaker since it is the defacto standard HA cluster manager in the open source community. When a customer in this context sends us a question related to Pacemaker we simply answer it instead of referring them to Red Hat. 

Additionally the TCO costs of the DRBD solution are by far cheaper than with VCS and VVR. And the bank can rely on the 24 x 365 remote support. There is no vendor lock-in, because it is open source.

This big bank in Great Britain chose LINBIT HA with the help of Redhat and there are very happy about it.


The LINBIT solution is a great piece in our internal infrastructure pushing the system to the next level – the performance, stability, and support is outstanding. Those guys deliver.

– Mark –

Chief Technology Officer

This is not the end…


A bank’s business is money, but that does not mean that the organization wants to spend more than necessary on its IT infrastructure. LINBIT’s DRBD is an effective solution to keep the highly available services reliable as they should be and gain more room to maneuver for investments into emerging technologies.


If you have any questions, don’t hesitate to email us at [email protected]

August 2019 – Newsletter



We’ll be speaking at the Flash Memory Summit on August 7th. Come see your favorite Engineer David Hay’s presentation on the “Key-Value Store and Linux Technologies.”

Cheap Votes: DRBD Diskless Quorum

Prevent inadequate fencing! Read about DRBD Diskless Quorum!

DRBD/LINSTOR vs Ceph – a technical comparison

Ever wonder what the differences are between Ceph and DRBD/LINSTOR? Well, we did too and we’re sharing it with you.

Coming Soon, a New DRBD Proxy Release

The next release of DRBD Proxy will come with improvements in data replication and compression. Check out what you have to look forward to!

Service & Support

Our first priority is you. Don’t hesitate to contact us.

Facebook    Twitter    LinkedIn    YouTube    LINBIT

LINSTOR LDAP Authentication

New Features of LINSTOR Release – July 2019

The Newest LINSTOR release (July 2019) came with a bunch of new features, and one that is really worth highlighting:

The developers of LINSTOR, the storage management tool for all things Linux, announced that the latest release comes with authentication for LDAP. Software-defined storage consumers were demanding privilege authentication, so we set this as a priority in July.

With support for basic LDAP authentication, you can configure an LDAP server and a search_filter to allow only members of a certain group access to LINSTOR. To accomplish this, here’s a sample configuration entry


  enabled = true

  uri = "ldaps://"

  dn = "uid={user},ou=users,o=ha,dc=example"

  search_base = "dc=example"

  search_filter =



The `{user}` template variable will be replaced with the login user.

Please note that LINSTOR must be configured with HTTPS in order to configure LDAP authentication. 

Now you can securely manage privileges of your storage clusters, so the antics of those pesky interns don’t keep you awake at night.


Greg Eckert on Linkedin
Greg Eckert
In his role as the Director of Business Development for LINBIT America and Australia, Greg is responsible for building international relations, both in terms of technology and business collaboration. Since 2013, Greg has connected potential technology partners, collaborated with businesses in new territories, and explored opportunities for new joint ventures.

LINBIT Announces DRBD Support for Amazon Linux

The Digital Transformation

The concept of “Digital Transformation” for executive teams at Fortune-sized companies is no longer a new and flashy phrase. An important part of this Digital Transformation is how companies think about cloud computing. When once, organizations seemed to have only 2 decisions: enter the cloud, or keep everything on premise; now the options are a bit more “cloudy” (pun intended).

In the digital transformation age, Fortune companies are looking at multi-cloud strategies. They understand that siloing data into one cloud provider decreases their flexibility and ability to negotiate discounts while increasing the risks of a provider outage affecting production workloads. When Fortune 1000 companies think about their multi-cloud strategies they basically have 3 options:

  1. Keep some data on-prem and put some in the cloud
  2. Put data in different regions or zones within a single cloud provider
  3. Place data in many separate cloud providers

What’s great about all three is that companies can be dynamic about how they solve business goals, allocate budget, and provision resources. With this multi-cloud shift, some of the traditional technologies used in businesses need to adapt and change.

One of our Fortune 500 clients who develops financial software and sells financial, accounting, and tax preparation software, came to us because they were switching an OS installation from RHEL to Amazon Linux. Clearly, they are deep into their Digital Transformation journey because though this workload was already in the cloud. Changing both the OS and automation toolchain of a cloud deployment this large is no easy feat.

As a small team, we pride ourselves in jumping high at client requests, and therefore within two weeks the work was done. The answer is “Yes. LINBIT now supports DRBD 9.0 on Amazon Linux.” As client demand changes, as workloads migrate to the cloud, and as containers gain traction, we are doing our best to be dynamic by listening to community & client feedback. 

With millions of downloads, we rely on clients and the open-source community users to tell us what they want. If you haven’t been following our progress, this means we are thinking about how to improve performance for Linux High Availability Clusters and Disaster Recovery clusters for traditional workloads on hardware like NVMe and Optane, while also looking into kernel technology’s role in Kubernetes environments in conjunction with public and private cloud environments. What challenges exist here that didn’t before? What do users want? These are the questions that drive our development.

So DRBD users: we’re here. We’re listening, feel free to chime in on the community IRC (#drbd on freenode) & mailing list forums, respond in the comments here, ask questions about our Youtube videos… & lets ensure that open-source continues to drive innovation as the commercial giants are deciding which technologies to choose in their 5 year technology goals.

Greg Eckert on Linkedin
Greg Eckert
In his role as the Director of Business Development for LINBIT America and Australia, Greg is responsible for building international relations, both in terms of technology and business collaboration. Since 2013, Greg has connected potential technology partners, collaborated with businesses in new territories, and explored opportunities for new joint ventures.

NuoDB on LINSTOR-Provisioned Kubernetes Volumes


NuoDB and LINBIT put our technologies together to see just how well they performed, and we are both happy with the results. We decided to compare LINSTOR provisioned volumes against Kubernetes Hostpath (Direct Attached Storage) in a Kubernetes cluster hosted in Google’s cloud platform (GCP) to show that our on-prem testing results can also be proven in a popular cloud-computing environment.


NuoDB is an ANSI SQL standard and ACID transactional compliant container-native distributed OLTP database that provides responsive scalability and continuous availability. This makes it a great choice for your distributed applications running in cloud provider-managed and open source Kubernetes environments, such as GKE, EKS, AKS and Red Hat OpenShift. As you scale-out NuoDB Transaction Engines in your cluster, you’re scaling out the database’s capacity to process dramatically more SQL transactions per second, and at the same time, building in process redundancy to ensure the database — and applications — are always on.

As you scale your database, you also need to scale the storage that the database is using to persist its data. This is usually where things get sluggish. Highly-scalable storage isn’t always highly-performant, and it seems most of the time the opposite is true. Highly-scalable, highly-performant storage is the niche that LINSTOR aims to fill.

LINSTOR, LINBIT’s SDS software, can be used to deploy DRBD devices in large scale storage clusters. DRBD devices are expected to be about as fast as the backing disk they were carved from, or as fast as the network device DRBD is replicating over (if DRBD’s replication is enabled). At LINBIT we usually aim for a performance impact of less than 5% when using DRBD replication in synchronous mode.

The LINSTOR CSI (container storage interface) driver for Kubernetes allows you to dynamically provision LINSTOR provisioned block devices as persistent volumes for your container workloads… you see where I’m going… 🙂


I spun up a 3-node GKE (Google’s Kubernetes Engine) cluster in GCP, and customized the standard node type with 6-vCPU and 22GB of memory for each node:


When using GKE to spin up a Kubernetes cluster, you’re provided with a “standard” storage class by default. This “standard” storage class dynamically provisions and attaches GCE standard disks to your containers that need persistent volumes. Those GCE standard disks are the pseudo “hostpath” device we wanted to compare against, so we deployed NuoDB into the cluster, and ran a YCSB (Yahoo Cloud Serving Benchmark) SQL workload against it to generate our baseline:


Using the NuoDB Insights visual monitoring tool (comes as standard equipment with NuoDB), we can see in the chart above we had 3 TEs (Transaction Engine) pods feeding into 1 SM (Storage Manager) pod. We can also see that our Aggregate Transaction Rate (TPS) is hovering just over 15K transactions per second. Also, as a side note, this deployment created 5-GCE Standard disks in my Google Cloud Engine account.

LINSTOR provisions its storage from an established LINSTOR cluster, so for our LINSTOR comparison, I had to stand up Kubernetes on GCE nodes “the hard way” so I could also stand up a LINSTOR cluster on the nodes (see LINBIT’s user’s guide or LINSTOR quickstart for more on those steps). I created 4-nodes as VM Instances in Kubernetes. 3-nodes were setup to mimic the GKE cluster, each with 6-vCPU and 22GB of memory, and our 1-master-node – with master node taint in Kubernetes so we will not schedule pods on this node – with 2-vCPU and 16GB of memory. Google recommended I scale these nodes back to save money, so I did that, resulting in the following VM instances:


After setting up the LINSTOR and Kubernetes cluster in the GCE VM Instances, I attached a single “standard” GCE disk to each node for LINSTOR to provision persistent volumes from, and deployed the same NuoDB distributed database stack and YCSB workload into the cluster:


After letting the benchmarks run for some time, I could see that we were hovering just under 15k, which is within the expected 5% of our ~15k baseline!


You might be thinking, “That’s good and all, but why not just use GKE with the GCE-backed ‘standard’ storage class?” The answer is features. Using LINSTOR to provide storage to your container platform enables you to:

  • Add replicas of volumes for resiliency at the storage layer – including remote replicas
  • Use replicas of your volumes in DRBD’s read balancing policies which could increase your read speeds beyond what’s possible from a single volume
  • Provide granular control of snapshots at either the Kubernetes or LINSTOR-level
  • Provide the ability to clone volumes from snapshots
  • Enable transparently encrypted volumes
  • Provide data-locality or accessibility policies
  • Lower managerial overhead in terms of the number of physical disks (comparing one GCE disk for each PV with GKE vs. one GCE disk for each storage node with LINSTOR).

Ultimately, the combination of NuoDB and LINSTOR enables clients to run high-performance persistent databases in the cloud or on premise with ease-of scale and “always-on” resiliency. So far, after testing both proprietary and open-source software, NuoDB has found that LINSTOR’s open-source SDS is a production-ready, high-performance, and highly reliable storage solution to provision persistent volumes.


Matt Kereczman on Linkedin
Matt Kereczman
Matt is a Linux Cluster Engineer at LINBIT with a long history of Linux System Administration and Linux System Engineering. Matt is a cornerstone in LINBIT’s support team, and plays an important role in making LINBIT’s support great. Matt was President of the GNU/Linux Club at Northampton Area Community College prior to graduating with Honors from Pennsylvania College of Technology with a BS in Information Security. Open Source Software and Hardware are at the core of most of Matt’s hobbies.

July 2019 newsletter


JULY 2019

Didn’t catch us at KubeCon Shanghai in June? We’ll be attending OSCON July 15th-18th in Portland, Oregon! Meet us there!

Key/Value Store in LINSTOR

LINSTOR plug-in developers, we’ve just made your life a little easier!

DRBD and the sync-rate controller, part 3

Continue your journey into the recesses of the sync-rate controller. This article will give you the knowledge you need to sync your data effectively.

What’s the Difference Between Off-Site Data Backup and Disaster Recovery (DR)?

Is a backup good enough? What’s the most downtime your business can tolerate?

Disaster Recovery (DR) Explained, LINBIT Q&A Series, Episode 3

In this episode, your favorite engineer, David Hay, explains the importance of DR and the positive impact of a near real-time replica utilizing DRBD.

Service & Support

Our first priority is you. Don’t hesitate to contact us.

Cheap Votes: DRBD Diskless Quorum

One of the most important considerations when implementing clustered systems is ensuring that a cluster remains cohesive and stable given unexpected conditions. DRBD already has fencing mechanisms and even a system of quorum, which is now capable of using a diskless arbitrator to break ties without requiring additional storage beyond that of two nodes.

Quorum and Fencing With a Healthy Dose of Reality

DRBD’s quorum implementation allows resources to vote on availability, taking into account connection state and disk state. While a DRBD cluster without quorum will allow promotion and writes on any node with “UpToDate” data, DRBD with quorum enabled adds the requirement that this node must also be in contact with either a majority of healthy nodes in the cluster, or a minimum amount of nodes as defined statically. This requires at least three nodes, and works best with odd numbers of nodes. A DRBD cluster with quorum enabled cannot become split-brain.

Fencing on the other hand, employs a mechanism to ensure node state by isolating or powering off a node in some way so that unhealthy nodes can be guaranteed to not provide services (by virtue of being assuredly offline). While the use case for fencing and quorum overlap by a large degree, fencing can automatically eject or recover misbehaving nodes, while quorum simply ensures that they cannot modify data.

It is possible to utilize scripts that are triggered in response to changes in quorum as a simple but effective fencing system via a “suicide” method — configuring a node to automatically reset or power itself off upon loss of quorum (accomplished via the “on-quorum-loss” handler in DRBD’s configuration). However, fully-fledged fencing methods via Pacemaker have much more logic behind them, can work even when the node to be fenced is entirely unresponsive, and make Pacemaker clusters “aware” of fencing actions.

The most important element to consider is that while both methods prevent split-brain conditions, quorum does not wholly and entirely replace out-of-band fencing. However, it comes extremely close, and in fact, close enough to eschew Pacemaker-based fencing in many configurations in favor of only quorum where fencing via privileged APIs (as is common in clouds) or dedicated fencing hardware (such as network PDUs or IPMI cards) is less than possible or desired.


Before now, in order for a DRBD resource to have three votes across three nodes for quorum, it needed three replicas of data. This was cost prohibitive in some scenarios, so additional logic was added to allow a diskless “arbitrator” node that does not participate in replication. Thusly, the diskless DRBD arbitrator was born.

The concept is fairly simple; rather than require a minimum of three replicas in a DRBD resource to enable quorum functionality, one can now use two replicas (or “data” nodes) with a third DRBD node in a permanently and intentionally diskless state as an “arbitrator” for breaking ties.

The same concepts of traditional DRBD quorum apply, with one significant exception: In a replica 2+A cluster, one node can be lost or disconnected without losing quorum — just like a replica 3 cluster. However, that arbitrator node cannot (on its own) participate in restoring quorum after it is lost.

The reason for this exception is simple: The arbitrator node has no disk. Without a disk, there is no way to independently determine whether data is valid, inconsistent, or related to the cluster at all because there is no data on that node to compare replicas with. While an arbitrator node cannot restore quorum to a single other inquorate data node, two data nodes may establish or re-establish quorum with each other. This is highly effective, and conquers the vast majority of quorum decisions at roughly 66% the cost of a replica 3 cluster.

Arbitrator Nodes in Action

I will not abide this level of grandstanding without a demonstration of this ability (and hopefully some revealing use cases), so below are some brief test results from a replica 2+A geo cluster. Behold:

[email protected]:~# drbdadm status export-able

export-able role:Primary


geo-nfs-b role:Secondary


geo-nfs-c role:Secondary


As you can see, everything is happy. All of these nodes are connected and up to date. Nodes “geo-nfs-a” and “geo-nfs-b” are data nodes with disks. The node “geo-nfs-c” is a diskless DRBD arbitrator as well as a Booth arbitrator, and quorum has been enabled in this geo cluster (though that’s not reflected in this output). Geo clusters can be tricky to manage the datapath of, since they often operate outside of the scope of rapid decision-making mechanisms and even more often don’t have a method of fencing “sites” adequately. Using DRBD quorum in this case allows split-brains to be entirely prevented globally, rather than depending on several disconnected cluster controllers to manage things. This is much more stable, but requiring three sites with at least one full data replica each is very bandwidth-intensive as well as expensive. This is a perfect fit for an arbitrator node.

If we take one of the two data nodes offline, the cluster will still run. We’re still in contact with the arbitrator, and as long as we don’t lose that contact, quorum will be held:

[email protected]:~# drbdadm status export-able

export-able role:Primary


geo-nfs-b connection:Connecting

geo-nfs-c role:Secondary


So let’s make it unhappy. If we take the majority of nodes offline this cluster will freeze, suspending I/O and protecting data from split-brain:

[email protected]:~# drbdadm status export-able

export-able role:Primary suspended:quorum

disk:UpToDate quorum:no blocked:upper

geo-nfs-b connection:Connecting

geo-nfs-c connection:Connecting

Reconnecting only the arbitrator node will not result in a quorate cluster, as that arbitrator has no way of knowing whether that data node is actually valid:

[email protected]:~# drbdadm status export-able

export-able role:Primary suspended:quorum

disk:UpToDate quorum:no blocked:upper

geo-nfs-b connection:Connecting

geo-nfs-c role:Secondary


Connecting the peer data node will result in I/O resuming even if the arbitrator is still not functioning:

[email protected]:~# drbdadm status export-able

export-able role:Primary


geo-nfs-b-0 role:Secondary


geo-nfs-c connection:Connecting


I was able to use a Booth arbitrator node as a DRBD arbitrator node as well, both managing the cluster application state as well as securing the datapath against corruption with almost zero bandwidth usage beyond that of a 2N system. This is clearly a potent use-case and could not be more simple.

This new quorum mechanism could be applied identically to local high availability clusters, allowing reliable quorate systems to be established using a very low power third node. This can help to cheaply circumvent environmental problems that prevent adequate fencing, such as generic platform-agnostic deployment models, security-restricted environments, and even total lack of out-of-band fencing mechanisms (such as some public clouds or specialized hardware).

For posterity, the following DRBD configuration was used to accomplish this. Keep in mind, this was a geo cluster, so it’s using asynchronous replication (protocol A). Protocol C would be used for synchronous local replication:

# /etc/drbd.conf

global {

    usage-count yes;


common {

    options {

           auto-promote     yes;

           quorum           majority;



resource export-able {

    volume 0 {

           device           minor 0;

           disk             /dev/drbdpool/export-able;

           meta-disk        internal;


    on geo-nfs-a {

           node-id 0;

           address          ipv4;


    on geo-nfs-b {

           node-id 1;

           address          ipv4;


    on geo-nfs-c {

           node-id 2;

           volume 0 {

                   device       minor 0;

                   disk         none;


           address          ipv4;


    connection-mesh {

           hosts geo-nfs-a geo-nfs-b geo-nfs-c;

           net {

                   protocol A;




David Hay on Linkedin
David Hay
Cluster Daemon at LINBIT
A long-time Linux system engineer, David Hay finds FOSS solutions to global problems as a Cluster Engineer at LINBIT. David started out with open source software back in the Linux 2.4 days, since then having planned and implemented countless clustered systems, leveraging HA and cloud technologies to great effect. When not liberating the enterprise world with free and open software, he spends his time tinkering with electronics and metalworking.
sync rate controller

DRBD and the sync-rate controller, part 3

This is an update to our previous two blog posts here and here. The goal with this post is to even further simplify the steps needed to tune the sync-rate controller. If you compare this post against the previous two post, you’ll see that I omit a few options, and even just simply pick some arbitrary starting values that work best in most deployments we’ve encountered.

I would also like to point out again that this is all about initial device synchronization and recovery resynchronization. This has no effect on the replication speeds which occur under normal replication when everything is in a healthy state.

Purpose of the sync rate controller

The dynamic sync-rate controller for DRBD was introduced way back in version 8.3.9. It was introduced as a way to slow down DRBD resynchronization speeds. The idea here is that if you have a write intensive application running atop the DRBD device, it may already be close to filling up your I/O bandwidth. We introduced the dynamic rate limiter to then make sure that recovery resync does not compete for bandwidth with the ongoing write replication. To ensure that the resync does not compete with application IO, the defaults lean towards the conservative side.

If the defaults seem slow to you or your use case, you can speed things up with a little bit of tuning in the DRBD configuration.

Tuning the sync rate controller

It is nearly impossible for DRBD to know just how much activity your storage and network backend can handle. It is fairly easy for DRBD to know how much activity it generates itself, which is why we tune how much network activity we allow DRBD to generate.

  • Set c-max-rate to 100% (or slightly more) than what your hardware can handle.
    • For example: if you know your network is capable of 10Gb/s, but your disk throughput is only 800MiB/s, then set this value to 800M.
  • Increase max-buffers to 40k.
    • 40k is usually a good starting point, but we’ve seen good results with anywhere between 20k to 80k.
  • Set c-fill-target to 1M.
    • Just trust us on this, and simply set it to ‘1M’.

This should be enough to get the resync rate going well beyond the defaults. Many people often tune the “c-*” sync rate controller setting, but never increase the max-buffers value. This may be partly our fault as we never mentioned it in the previous blog post, which is one reason I am revisiting this topic today.

Tuning the sync rate controller even further

Obviously, there is even further tuning we can do. Some of these, if tuned improperly, may have negative impacts on the application performance of programs writing to the DRBD device, so use caution. I would suggest starting with smaller values and working your way up if performing this tuning on production systems.

  • Set the resync-rate to ⅓ of the c-max-rate.
    • With the dynamic resync-rate controller, this value is only used as a starting point. Changing this will only have a slight effect, but will help things speed up faster.
  • Increase the c-min-rate to ⅓ of the c-max-rate.
    • It is usually advised to leave this value alone as the idea behind the dynamic sync rate controller is to “step aside” and allow application IO to take priority. If you really want to ensure things always move along at a minimum speed, then feel free to tune this a bit. As I mentioned earlier, you may want to start with a lower value and work up if doing this on a production system.
  • Set sndbuf-size and rcvbuf-size to 10M.
    • This is generally auto-tuned by the kernel, but cranking this up may help to move along the recovery resync speeds. There is also a possibility that this will lead to buffer-bloat, so tune these with caution. Again, on a production system, start with a value just a little over 4M and increase it slowly while observing the systems.

It is our hope that the information above will prove useful to some of our users and help possibly clear up some confusion regarding the resync tunables we have discussed in the past. As always, please feel free to drop us a comment below if you have any questions or anything you’d like to share.

Devin Vance on Linkedin
Devin Vance
First introduced to Linux back in 1996, and using Linux almost exclusively by 2005, Devin has years of Linux administration and systems engineering under his belt. He has been deploying and improving clusters with LINBIT since 2011. When not at the keyboard, you can usually find Devin wrenching on an American motorcycle or down at one of the local bowling alleys.