There is now an updated version of the topic available, including LINSTOR!
Guest blog by Jason Mayoral.
DRBD works by inserting a thin layer in between the file system (and the buffer cache) and the disk driver. The DRBD kernel module captures all requests from the file system and splits them down into two paths.
So, how does the actual communication occur? How do two separate servers optimize data protection?
DRBD facilitates communication by mirroring two separate servers – one server, although passive, is usually a direct copy of the other. Any data written to the primary server is simultaneously copied to the secondary one through a real time communication system. The passive server also immediately replicates any change made in the data.
DRBD 8.x works on two nodes at a time – one is given the role of the primary node, the other – a secondary role. Reads and writes can only occur on the primary node.
The secondary node must not mount the file system, not even in read-only mode. While it’s true to say that the secondary node sees all updates on the primary node, it can’t expose these updates to the file system, as DRBD is completely file system agnostic.
One write goes to the actual disk and another to a mirrored disk on a peer node. If the first one fails, the file system can be displayed on the opposing node and the data will be available for use.
DRBD has no precise knowledge of the file system and, as such, it has no way of communicating the changes upstream to the file system driver. The two-at-a-time rule does not actually limit DRBD from operating on more than two nodes.
Moreover, DRBD-9.x supports multiple peer nodes, meaning one peer might be a synchronous mirror in the local datacenter while another secondary might be an asynchronous mirror in a remote site.
Again, the passive server only becomes functional when the primary one fails. When such a failure occurs, Pacemaker immediately recognizes the mishap and shifts to the secondary server. This shifting process, nevertheless, is optional- it can either be manual or automatic. For users who prefer manual, one is required to authorize the system to shift to the passive server when the primary one fails.
CEPH is open source software intended to provide highly scalable object, block, and file-based storage in a unified system.
CEPH storage clusters are designed to run on commodity hardware, using an algorithm called CRUSH (Controlled Replication Under Scalable Hashing) to ensure data is evenly distributed across the cluster and that all cluster nodes can retrieve data quickly without any centralized bottlenecks.
Ceph object storage is available through Amazon Simple Storage Service (S3) and OpenStack Swift Representational State Transfer (REST) – based application programming interfaces (APIs), and a native API for integration with software applications.
Ceph block storage uses a Ceph Block Device, which is a virtual disk that can be attached to bare-metal Linux-based servers or virtual machines. The Ceph Reliable Autonomic Distributed Object Store (RADOS) provides block storage capabilities, such as snapshots and replication. The Ceph RADOS Block Device is integrated to work as a back end with OpenStack Block Storage.
Ceph implements distributed object storage. Ceph’s software libraries provide client applications with direct access to the reliable autonomic distributed object store (RADOS) object-based storage system, and also provide a foundation for some of Ceph’s features, including RADOS Block Device (RBD), RADOS Gateway, and the Ceph File System.
The librados software libraries provide access in C, C++, Java, PHP, and Python. The RADOS Gateway also exposes the object store as a RESTful interface which can present as both native Amazon S3 and OpenStack Swift APIs.
Ceph’s object storage system allows users to mount Ceph as a thin-provisioned block device. When an application writes data to Ceph using a block device, Ceph automatically stripes and replicates the data across the cluster. Ceph’s RADOS Block Device (RBD) also integrates with Kernel-based Virtual Machines (KVMs).
Ceph RBD interfaces with the same Ceph object storage system that provides the librados interface and the CephFS file system, and it stores block device images as objects. Since RBD is built on librados, RBD inherits librados’s abilities, including read-only snapshots and revert to snapshot. By striping images across the cluster, Ceph improves read access performance for large block device images.
The block device can be virtualized, providing block storage to virtual machines, in virtualization platforms such as Apache CloudStack, OpenStack, OpenNebula, Ganeti, and Proxmox Virtual Environment.
Guest blog by Jason Mayoral (www.rebelbranding.us).