Albireo Virtual Data Optimizer (VDO) on DRBD

TL;DR: Pairing DRBD with VDO reduces the replication network and storage utilization by ~85% while increasing load by ~0.8.

VDO (Virtual Data Optimizer)[1] is a ready-to-run software package that delivers block-level deduplication, compression, and thin provisioning capabilities to Linux. VDO operates inline at a 4 KB granularity, delivering the best possible balance of performance and data reduction rates.

This sounds like a great software to pair with DRBD, right?! We were most interested in adding deduplication to the storage stack above DRBD for more efficiency in our replicated writes. So we ran some tests doing just that while measuring the IO on our backing disks as well as the network traffic on our replication network; we did the same on a vanilla DRBD device for comparison.

We decided that deploying and cloning numerous CentOS 7 virtual machines on both of our devices was a good way to test VDO’s deduplication and it’s effects on replication. We chose this test because VMs of the same distro will have identical blocks where their binaries, libraries, etc (no, not that /etc) are stored. They will also have some blank space at the end of their virtual disks (zeros) that VDO should be able to compress/dedup down to almost nothing. Lastly, but more importantly, replicating VM’s virtual disks is a real-life use case and not just an academic experiment.

The Setup: First, we setup two DRBD devices and followed the appropriate steps[2] to use them as our LVM physical volumes (VDO requires LVM), created PV and VG signatures for both devices, and created the VDO device on one of our DRBD disks. We then formatted the resulting block devices with XFS filesystems, mounted them, and created CentOS 7 VMs with fully allocated 20GiB qcow2 virtual disks in each mount point. Since we also recorded system load during our testing I should mention that the test systems each had 16 CPUs (Intel Xeon E7520s).

The Testing: We wanted to see how much data would be pushed to disk and onto the replication network when we created a clone of the virtual machines; we used ‘iostat’ to measure the IO on DRBD’s backing disks and ‘iptraf’ to measure the replicated data on DRBD’s replication ports during the cloning. We also recorded the peak load during each iteration of our testing to see how expensive the dedup and compression was on our dataset.

The Results: On a vanilla DRBD device we saw system load climb to 2.62, the backing disk saw 20506MB of writes, and the replication network transferred 21607MB of data. On the DRBD backed VDO device we saw system load climb to 3.45, the backing disk saw 2833MB of writes, and the replication network transferred 3234MB of data. Look at those savings!!  VDO reduced the replication network and storage utilization by about 85% while only increasing load by ~0.8.

This is a small dataset which was intended as a POC, but it isn’t hard to imagine other datasets that could benefit from deduplication. It’s also not hard to see the benefit of deduplication when considering replication over slow networks (WAN) where every Byte of bandwidth counts. Pairing VDO with DRBD and DRBD Proxy seems like a win to us!

Check out the official press release: HERE

[1] See http://permabit.com/products-overview/albireo-virtual-data-optimizer-vdo/
[2] See http://www.drbd.org/en/doc/users-guide-84/s-lvm-drbd-as-pv, note the on RHEL/CentOS 7 you also need to disable lvmetad in the lvm.conf and systemd.

0 replies

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply

Your email address will not be published. Required fields are marked *