NVMe Over Fabrics

NVMe-oF and iSCSI (iSER)

As a Linux storage company, LINBIT is always excited for an opportunity to work with the latest developments in storage. One of these new technologies is NVMe over fabrics (NVMe-oF). NVMe is a device specification for nonvolatile memory that utilizes the PCIe bus, and NVMe-oF allows us to use these specifications over a network. You can somewhat think of this in similar terms to SCSI and iSCSI. Also, much like iSCSI, it isn’t actually required that you use a NVMe device as the backing storage for a NVMe-oF device. This makes NVMe-oF a great way to attach DRBD-backed storage clusters to hypervisors, container hosts, or applications of numerous types.

The parallels were obvious between NVMe-oF and iSCSI, so I naturally wanted to do some testing with it to see how it compared with iSCSI. I had originally intended to compare iSCSI to NVMe over TCP, but soon found out those patches were not yet merged upstream. As I was still intending to test using Ethernet interfaces, I then quickly steered towards RoCE (RDMA over Converged Ethernet). Then, in order to make a more fair comparison, I used the iSER (iSCSI Extensions for RDMA) transport for iSCSI.

The systems in use are relatively new Intel i7 machines (i7-7820X). The single CPU has 16 threads and a clock speed of 3.6GHz. The systems both have 64GiB of 2133 DDR4 memory. The storage is 3x 512GiB Samsung 970 PRO configured in raid 0 via Linux software RAID. The network between the initiator and target was two directly connected Mellanox Connext-5 interfaces bonded using mode 2 (balancing xor).

The tests were all focused on IO operations per second on 4k block sizes. Backing disks were configured to use the mq-deadline scheduler. All tests were performed using fio version 3.13. The direct IO test ran for 30 seconds using 16 jobs and an iodepth of 32. The libaio ioengine was used for all tests. The exact fio command can be found in the footnotes. 1

 

Much to my surprise, it seems that iSCSI with iSER outperformed NVMe-oF in sequential writes. However, it seems that iSCSI really struggled with random IO, both in reads and writes. In regard to random IO, NVMe-oF outperformed iSCSI by roughly 550%. With the exception of the iSCSI random IO and the NVMe-oF Sequential writes, most tests performed nearly on par with the raw hardware when tested locally, coming in at well over 1 million iops! If you have any random IO intensive workloads, it might be time to consider implementing NVMe-oF.

Let us know in the comments below if you’re looking to make the jump to NVMe-oF, if you’ve already made the jump and the differences you’ve seen, or if you have any questions regarding our test environment.

Footnotes

1. /usr/local/bin/fio --name test$i --filename $BLOCKDEVICE --ioengine libaio --direct 1 --rw $IOPATTERN --bs=4k --runtime 30s --numjobs 16 --iodepth 32 --group_reporting --append-terse

Devin Vance on Linkedin
Devin Vance
First introduced to Linux back in 1996, and using Linux almost exclusively by 2005, Devin has years of Linux administration and systems engineering under his belt. He has been deploying and improving clusters with LINBIT since 2011. When not at the keyboard, you can usually find Devin wrenching on an american motorcycle or down at one of the local bowling alleys.
0 replies

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply

Your email address will not be published. Required fields are marked *