DRBD Proxy 3.1: Performance improvements

The threading model in DRBD Proxy 3.1 received a complete overhaul; below you can see the performance implications of these changes.First of all, as it suffered from the distinction between low latency for the meta-data connections vs. high bandwidth for the data connections, a second set of pthreads has been added. The first one runs at the (normally negative) nice level the DRBD Proxy process is started at, while the second set, in order to be “nicer” to the other processes, adds +10 to the nice level and therefore gets a smaller chunk of the cpu time.

Secondly, the internal processing has been changed, too. This isn’t visible externally, of course – you can only notice the performance improvements.

DRBD Proxy 3.1 buffer usage

In the example graph above a few sections can be clearly seen:

  • From 0 to about 11.5 seconds the Proxy buffer gets filled. In case anyone’s interested, here’s the dd output:
    3712983040 Bytes (3.7 GB) copied, 11.4573 s, 324 MB/s
  • Until up to ~44 seconds, there is lzma compression active, with a single context. Slow, but compresses the best.
  • Then I switched to zlib; this is a fair bit faster. All cores are being used, so external requests (by some VMs and other processes) show up as irregular spikes. (Different compression ratios for various input data are “at fault”, too.)
  • At 56 seconds the compression is turned off completely; the time needed for the rest of the data (3GiB in about 13 seconds) shows the bonded-ethernet bandwidth of about 220MB/sec.

For two sets of test machines[1. 8 vs. 16 CPUs with 1.8 GHz each, and with DRBD and the Proxy on the same machine] a plausible rate for transferring large blocks[1. E.g. with 1MB bios, streaming writes or during resync with 8.4.x] into the Proxy buffers is 450-500MiB/sec[1. limited by the kernel/userspace memcpy() performance].
For small buffers there are a few code paths that are not fully optimized yet[1. E.g. memory allocation], further improvements are to be expected in the next versions, too.

The roadmap for the near future includes a shared memory pool for all connections and WAN bandwidth shaping (ie. limitation to some configured value) — and some more ideas that have to be researched first.

Opinions? Contact us!

0 replies

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply

Your email address will not be published. Required fields are marked *