People understand the importance of hospital systems needing to be Highly Available. This is easy to explain since people’s LIVES depend on medical equipment and information being accessible at all times. Likewise, people understand the importance of banks needing High Availability (HA) — they expect access to their MONEY on-demand and want it protected. You don’t have to be a techie to quickly understand why hospitals and banks need to be constantly available. However, the need for HA at educational institutions is a bit more difficult to initially identify, because they are not often thought of as places where ‘mission-critical’ systems are a real requirement. I believe the story is told less, as it has an underwhelming shock factor– people’s lives are not at stake, nor is their money hanging in the balance. At LINBIT, we have many educational customers, including prestigious universities, and we wanted to get their perspective on why HA and why LINBIT. Read more
Some say that no one dreads a day of downtime like a storage admin.
I disagree. Sure, the storage admins might be responsible for recovering a whole organization if an outage occurs; and sure, they might be the ones who lose their jobs from an unexpected debacle, but I would speculate that others have more to lose.
First, the company’s reputation takes a big, possibly irreparable hit with both clients and employees. Damage control usually lasts far longer than the original outage. Take the United Airlines case from earlier in 2017 when a computer malfunction led to the grounding of all domestic flights. Airports across the country were forced to tweet out messages about the technical issues after receiving an overwhelming number of complaints. Outages such as this one can take months or years to repair the trust with your customers. Depending upon the criticality of the services, a company could go bankrupt. Despite all this, even the company isn’t the biggest loser; it is the end-user: and that is what the rest of this post will focus on.
Let’s say you’re a senior in college. It’s spring term, and graduation is just one week away. Your school has an online system to submit assignments which are due at midnight, the day before finals week. Like most students at the school, you log into the online assignment submission module, just like you have always done. Except this time, you get a spinning wheel. Nothing will load. It must be your internet connection. You call a friend to have them submit your papers, but she can’t login either. The culprit: the system is down.
Now, it’s 10:00 PM and you need to submit your math assignment before midnight. At 11:00 PM you start to panic. You can’t log-in and neither can your classmates. Everyone is scrambling. You send a hastily written email to your professor explaining the issue. She is unforgiving because you shouldn’t have procrastinated in the first place. At 1:00 AM, you refresh the system and everything is working (slowly), but the deadlines have passed. The system won’t let you submit anything. Your heart sinks as you realize that without that project, you will fail your math class and not be able to graduate.
This system outage caused heartache, stress and uncertainty for the students and teachers along with a whole lot of pain for the administrators. The kicker is that the downtime happened when traffic was anticipated to be the highest! Of course, the servers are going to be overloaded during the last week of Spring term. Yet, notoriously, the University will send an email stating that it experienced higher than expected loads; and that ultimately, they weren’t prepared for it.
During this time, traffic was 15 times its normal usage, and the Hypervisor hosting the NFS server and the file sharing system was flooded with requests. It blew a fan and eventually overheated. Sure, the data was still safe inside the SAN on the backend. However, none of that mattered when the students couldn’t access the data until the admin rebuilt the Hypervisor. By the time the server was back up and running, the damage was done.
High Availability isn’t a simple concept but it is critical for your organization, your credibility, and even more importantly, for your end-users or customers. In today’s world, the bar for “uptime” is monstrously high therefore downtime is simply unacceptable.
If you’re a student, an admin or a simple system user- I have a question for you (and don’t just think about yourself, think about your boss, colleagues, and clients):
What would your day look like if your services went unresponsive right… NOW?!
Learn more about the costs and drivers of data loss, and how to avoid it, by reading the paper from OrionX Research.
2017 is coming to a close and it is a good time to look back and then look forward. Thank you to our customers, partners, and the broader open source community for your participation, 2017 was a year of many accomplishments for LINBIT. We celebrated over 1.6 million downloads of DRBD, expanded into China, and released 4 new technical guides: HA NFS on RHEL 7, HA iSCSI on RHEL 7, HA & DR for ActiveMQ, and DRBD with Encryption. Read more
We get asked the question, “do you have a sandbox cluster we can play around in?”, by admins and potential clients looking to get a feel for managing a DRBD/Pacemaker cluster fairly often. Instead of spinning up some cloud instances and doling out access, we decided it would be better for our potential clients to be able to see how it all works in their actual environment. Ansible seemed like the best way to create a “one size fits all” solution for deploying such clusters into an unknown environment, and after a few days hacking together a playbook, it proved to be a good choice.
The end result was an Ansible playbook that can deploy a few different cluster configurations onto a pair of nodes in any environment. The playbook prompts the user for some inputs that will specify which type of cluster to deploy, which LINBIT contract to register the target nodes with, and which credentials to use for said registration; all of which could be set in your inventory file or passed via extra arguments on the command line to avoid prompting. After the playbook runs, you’re left with an initialized DRBD device and Pacemaker cluster at the very least, or a full blown HA cluster serving out either iSCSI or NFS (expect more later) that you can test with until your heart’s content.
You can find directions and my Ansible playbook’s repo on GitHub.
LINBIT Solution Simplifies HA and DR for ActiveMQ
BEAVERTON, Ore., Dec. 6, 2017 — LINBIT, a leader in open source High Availability (HA), Software Defined Storage (SDS), Disaster Recovery (DR) and the force behind the DRBD software, today announced that it is bringing disaster recovery capabilities to Apache ActiveMQ™, the most popular open source messaging and Enterprise Integration Pattern (EIP) server software.
The LINBIT solution simplifies HA and DR for ActiveMQ because it does not require a clustered file system or shared database, a common requirement in current HA/DR implementations.
“Reliable communication in a distributed environment is a critical part of modern IT systems,” said Philipp Reisner, CEO of LINBIT. “LINBIT DR for ActiveMQ reduces cost and complexity for data centers and mitigates the risk often seen with SAN, clustered file systems, or shared databases.”
At TruckPro, the LINBIT DRBD software “is used primarily for resiliency,” stated Henry Santamaria, Director of Infrastructure. “Uptime is important for our business and anything we can do to quickly recover from any issue is paramount. Our investment in LINBIT yielded a noticeable increase in performance and stability which we did not have before.”
Known for its stability and performance over the last 15 years, LINBIT software is used by thousands of organizations across the globe, and is embedded in products from independent software vendors and established equipment manufacturers under OEM agreements. “With over 10,000 downloads per month, it is easy to see why even the most demanding environments rely on LINBIT to reduce risk and improve performance,” said Brian Hellman, LINBIT COO.
About LINBIT (http://www.linbit.com)
LINBIT is the force behind DRBD and the de facto open standard for High Availability (HA) software for enterprise and cloud computing. The LINBIT DRBD software is deployed in thousands of mission-critical environments worldwide to provide High Availability (HA), Geo Clustering for Disaster Recovery (DR), and Software Defined Storage (SDS) for OpenStack based clouds. Visit us at http://www.LINBIT.com, https://twitter.com/linbit, or https://www.linkedin.com/company/linbit. LINBIT is Keeping the Digital World Running.
Guest blog by Jason Mayoral
DRBD works by inserting a thin layer in between the file system (and the buffer cache) and the disk driver. The DRBD kernel module captures all requests from the file system and splits them down into two paths.
So, how does the actual communication occur? How do two separate servers optimize data protection?
DRBD facilitates communication by mirroring two separate servers – one server, although passive, is usually a direct copy of the other. Any data written to the primary server is simultaneously copied to the secondary one through a real time communication system. The passive server also immediately replicates any change made in the data.
DRBD 8.x works on two nodes at a time – one is given the role of the primary node, the other – a secondary role. Reads and writes can only occur on the primary node.
The secondary node must not mount the file system, not even in read-only mode. While it’s true to say that the secondary node sees all updates on the primary node, it can’t expose these updates to the file system, as DRBD is completely file system agnostic.
One write goes to the actual disk and another to a mirrored disk on a peer node. If the first one fails, the file system can be displayed on the opposing node and the data will be available for use.
DRBD has no precise knowledge of the file system and, as such, it has no way of communicating the changes upstream to the file system driver. The two-at-a-time rule does not actually limit DRBD from operating on more than two nodes.
Moreover, DRBD-9.x supports multiple peer nodes, meaning one peer might be a synchronous mirror in the local datacenter while another secondary might be an asynchronous mirror in a remote site.
Again, the passive server only becomes functional when the primary one fails. When such a failure occurs, Pacemaker immediately recognizes the mishap and shifts to the secondary server. This shifting process, nevertheless, is optional- it can either be manual or automatic. For users who prefer manual, one is required to authorize the system to shift to the passive server when the primary one fails.
CEPH is open source software intended to provide highly scalable object, block, and file-based storage in a unified system.
CEPH storage clusters are designed to run on commodity hardware, using an algorithm called CRUSH (Controlled Replication Under Scalable Hashing) to ensure data is evenly distributed across the cluster and that all cluster nodes can retrieve data quickly without any centralized bottlenecks.
Ceph object storage is available through Amazon Simple Storage Service (S3) and OpenStack Swift Representational State Transfer (REST) – based application programming interfaces (APIs), and a native API for integration with software applications.
Ceph block storage uses a Ceph Block Device, which is a virtual disk that can be attached to bare-metal Linux-based servers or virtual machines. The Ceph Reliable Autonomic Distributed Object Store (RADOS) provides block storage capabilities, such as snapshots and replication. The Ceph RADOS Block Device is integrated to work as a back end with OpenStack Block Storage.
Ceph implements distributed object storage. Ceph’s software libraries provide client applications with direct access to the reliable autonomic distributed object store (RADOS) object-based storage system, and also provide a foundation for some of Ceph’s features, including RADOS Block Device (RBD), RADOS Gateway, and the Ceph File System.
The librados software libraries provide access in C, C++, Java, PHP, and Python. The RADOS Gateway also exposes the object store as a RESTful interface which can present as both native Amazon S3 and OpenStack Swift APIs.
Ceph’s object storage system allows users to mount Ceph as a thin-provisioned block device. When an application writes data to Ceph using a block device, Ceph automatically stripes and replicates the data across the cluster. Ceph’s RADOS Block Device (RBD) also integrates with Kernel-based Virtual Machines (KVMs).
Ceph RBD interfaces with the same Ceph object storage system that provides the librados interface and the CephFS file system, and it stores block device images as objects. Since RBD is built on librados, RBD inherits librados’s abilities, including read-only snapshots and revert to snapshot. By striping images across the cluster, Ceph improves read access performance for large block device images.
The block device can be virtualized, providing block storage to virtual machines, in virtualization platforms such as Apache CloudStack, OpenStack, OpenNebula, Ganeti, and Proxmox Virtual Environment.
Guest blog by Jason Mayoral ( www.rebelcorp.us )
Innovative Data Storage Can Save Cash, Headaches, and Your Data
Storage Downtime is Unacceptable
When the network goes down, everyone is mildly annoyed, but when the storage goes down, “Everyone loses their mind, ” as the Joker would say. And for good reason. No one likes losing payroll data, shipments, customer information, financial transactions, or CRM information… And they certainly don’t like waiting while you roll back to your latest backup. Internally and externally, data-loss and downtime wastes valuable resources and it hurts company reputation. Downtime is becoming less acceptable every day, and data-loss, even more so. Stable, safe, and secure storage should be a priority for those responsible for protecting their business (just ask Equifax).
Due to the increasing need for high availability (HA) and disaster recovery (DR), proprietary storage companies like NetApp and Dell EMC have provided SAN and NAS technologies to protect your organization’s most important data. These hardware appliances, many times, have no single point of failure, synchronous data replication and even a nice GUI so that users can point-and-click their way around. The downside? These storage appliances aren’t scalable and they are expensive. Really expensive.
The Obvious (or not so obvious) Alternative
Did you know that resiliency is built into your Linux OS? That’s right, built into the mainline linux kernel is everything you need to replace your shared storage. For over 15 years, LINBIT has been creating the DRBD software, designed to synchronously replicate data between Linux servers seamlessly just like your SAN. It can even trick the application above to believing they are writing to a SAN, when in reality, it is standard X86, ARM, or Power boxes. The full LINBIT HA solution combines the DRBD software with open source fail-over software as well. This combination eliminates the need for proprietary shared storage solutions. So, why aren’t you using it? You probably didn’t know that it existed.
For the past 20 years, those with IT know-how, and small budgets found that HA clustering, using commodity off-the-shelf hardware, was an affordable alternative to traditional storage methods. This crowd consisted of the standard Linux hacker rolling out a home-brewed web-server, and the hyperscale players who didn’t want to rely on outside vendors to build their cloud. Being that these hyperscale companies are using the software to create a competative advantage against their competitors they aren’t all-that-eager to share their stories. They have kept the mid-market in the dark.
Almost all of the major players (including Google, Cisco, Deka Bank, HP, Porsche, and the BBC) have realized that using standard hardware instead of proprietary appliances creates a competitive advantage. Namely: inexpensive resilient storage that their competitors are paying an arm and a leg for. Now, the storage industry’s best kept secret is finally out.
It Doesn’t Stop There
LINBIT is pioneering open source SDS. In development for over 7 years, the new solution will create standard High Availability clusters like described above, and also work perfectly for cloud storage. The LINBIT SDS software introduces performance advantages scalability to the design. LINBIT’s created a sort of “Operating System based,” Open Source, Software Defined Storage technology that is already built into your existing operating system and ready to use with any Linux system.
The Default Replication Option
LINBIT’s DRBD software receives about 10,000 confirmed downloads per month (people who opt-in to show their statistics). LINBIT is far more engineering and development focused than sales focused so if you aren’t solving a real-world problem you have probably never ran into them. LINBIT’s software popularity is user driven, and due to 3 main reasons:
Flexibility: Since the DRBD software replicates data at the block level, it works with any filesystem, VM, or application that writes data to a hard drive. It can replicate multiple resources simultaneously so users don’t have to choose different replication technologies for every application/database running on the server.
Stability: Being accepted into the mainline Linux kernel is a very stringent process. DRBD has been in the kernel since 2009, version 2.6.33
Synchronous: Prior to DRBD’s availability (no pun intended), the only option for synchronous replication was hardware (SAN, NAS devices). The DRBD software can run in synchronous or asynchronous mode, and be used for local replication or Geo Clustering across long distances.
Now that DRBD has tools to provision your storage, scaling out has never been easier. Interested in how this might apply for your projects? Check out some of LINBIT’s (free) innovative technical documents which describe how to set up a cluster for your specific environment. Have an idea that isn’t covered in the documentation? Reach out to [email protected] and ask if your idea is sane. They’ll consult the LINBIT engineering team, and will point you in the right direction. Most importantly, NEVER settle for unplanned downtime.
Find out more about the costs of downtime in the podcast, The OrionX Download with LINBIT CEO, Brian Hellman.
Today we’re happy to announce a new document titled “Block Replication with Filesystem Encryption” which showcases another wonderful use case for DRBD.
At Hosting Con, back in April of this year, some colleagues of mine ran into some representatives from Randtronics. Randtronics is the company responsible for the DPM (Data Privacy Management) software suite. This software suite provides file encryption, user management, ACLs, and more. I could imagine this software would prove useful to those in fields where data privacy is an absolute must. Fields such as the medical, legal, human resources, or intellectual property, quickly come to mind.
(Graphic is property of Randtronics)
After a brief discussion with us regarding just how versatile DRBD can be it was decided to see if perhaps DRBD could work seamlessly with DPM. Randtronic’s DPM can help protect your data from prying eyes, or those who may wish to steal it, but can it protect your data from system failures? When teamed up with DRBD you can be assured that your data is both secure and available.
I worked briefly with Gary Lansdown of Randtronics to introduce him to asciidoc, but I must give credit to Randtronics for this document.
Every so often we get a chance to test new¹ software. Usually this opportunity is driven by the question: Does DRBD play nicely with it?
At HostingCon this year, we met a team from Atomicorp and decided that it would be interesting to see if we could get DRBD running on this hardened version of Linux. Overall, LINBIT’s broad client-base loosly includes “security” since “Availability” is one of the 3 Security pillars of the CIA triad.
Security certainly fits with Atomicorp since they focus on clients in the federal, financial, healthcare, and hosting space. Their HQ is based in the same business park as Raytheon, Boeing, and Booz Allen Hamilton, if that tells you anything about their market.
We frequently take on the challenge of seeing if we can get DRBD compiled and working correctly, like that time we installed it on 2 raspberry pi’s, and this case was no different. While we were confident that there wouldn’t be issues with installation, — after all, it’s Linux — we needed to verify compatibility with the ASL (Atomic Secured Linux™) hardened kernel before announcing that it works.
After speaking with the Atomicorp team, they let us know that some of their clients were already running DRBD and Pacemaker for High Availability within their data centers. That’s great news! We anticipated that the testing would go quickly since we already had verified users.
Upon installing DRBD on a pair of RHEL 7 systems, we found something unexpected. DRBD is already included in the ASL kernel. This means Atomicorp is hardening and packaging a newer mainline kernel instead of hardening that which the distribution supplies. Nice work Atomicorp! The DRBD 8.4.5 version in the ASL kernel is pretty recent too.
It’s funny. Clients often ask us if we have seen DRBD used for their specific use case. DRBD is so versatile that we’re not always familiar with every situation. If we had been asked if anyone was using DRBD with Atomicorp’s ASL product, we would have said “I don’t know.” The irony here is that when you install the ASL hardened kernel, you may automatically get DRBD on a distribution where you otherwise may have not. It is available for everyone who runs Atomicorp’s ASL kernel whether the end user leverages the replication functionality or not².
This isn’t just a fun, internal office story; this is the essence of how Open Source Software works. We now know that there is a connection between ASL and DRBD, and are delighted to work with Atomicorp moving forward. It just makes sense since end-clients of both Atomicorp and LINBIT achieve feature-sets that they wouldn’t have otherwise. Altogether, our partners help advocate for our open source software and when our solutions are combined, everyone keeps inching toward bigger and better solutions, while maintaining focus on their core competencies.
So does the DRBD software work with Atomicorp and the Atomic Secured Linux™ kernel? Of course it does; and now, for the next few weeks, I get to be mocked by my coworkers for having our engineers test something which already had our software baked into it. 😉
1: New to us.
2: You’ll still need the userland utilities to manage and initialize DRBD, but that’s less of security concern than compiling and inserting a kernel module.
To quote the Apache Software Foundation:
Apache ActiveMQ™ is the most popular and powerful open source messaging and Integration Patterns server. Apache ActiveMQ is fast, supports many Cross Language Clients and Protocols, comes with easy to use Enterprise Integration Patterns and many advanced features while fully supporting JMS 1.1 and J2EE 1.4. Apache ActiveMQ is released under the Apache 2.0 License.
Deploying a synchronously replicated shared-nothing storage cluster (DRBD) as outlined in this guide, is a supported method for achieving HA without requiring a clustered filesystem or shared database. This method also mitigates the risk of a SAN, clustered filesystem, or shared database being a single point of failure in our persistent storage layer. Read more