Posts

Don’t Settle for Downtime

Innovative Data Storage Can Save Cash, Headaches, and Your Data

Storage Downtime is Unacceptable

When the network goes down, everyone is mildly annoyed, but when the storage goes down,  “Everyone loses their mind, ” as the Joker would say.  And for good reason. No one likes losing payroll data, shipments, customer information, financial transactions, or CRM information… And they certainly don’t like waiting while you roll back to your latest backup. Internally and externally, data-loss and downtime wastes valuable resources and it hurts company reputation. Downtime is becoming less acceptable every day, and data-loss, even more so. Stable, safe, and secure storage should be a priority for those responsible for protecting their business (just ask Equifax).

Traditional Solutions

Due to the increasing need for high availability (HA) and disaster recovery (DR), proprietary storage companies like NetApp and Dell EMC have provided SAN and NAS technologies to protect your organization’s most important data. These hardware appliances, many times, have no single point of failure, synchronous data replication and even a nice GUI so that users can point-and-click their way around. The downside? These storage appliances aren’t scalable and they are expensive. Really expensive.

The Obvious (or not so obvious) Alternative

Did you know that resiliency is built into your Linux OS? That’s right, built into the mainline linux kernel is everything you need to replace your shared storage. For over 15 years, LINBIT has been creating the DRBD software, designed to synchronously replicate data between Linux servers seamlessly just like your SAN. It can even trick the application above to believing they are writing to a SAN, when in reality, it is standard X86, ARM, or Power boxes. The full LINBIT HA solution combines the DRBD software with open source fail-over software as well. This combination eliminates the need for proprietary shared storage solutions. So, why aren’t you using it? You probably didn’t know that it existed.

 

For the past 20 years, those with IT know-how, and small budgets found that HA clustering, using commodity off-the-shelf hardware, was an affordable alternative to traditional storage methods. This crowd consisted of the standard Linux hacker rolling out a home-brewed web-server, and the hyperscale players who didn’t want to rely on outside vendors to build their cloud. Being that these hyperscale companies are using the software to create a competative advantage against their competitors they aren’t all-that-eager to share their stories. They have kept the mid-market in the dark.

Almost all of the major players (including Google, Cisco, Deka Bank, HP, Porsche, and the BBC) have realized that using standard hardware instead of proprietary appliances creates a competitive advantage. Namely: inexpensive resilient storage that their competitors are paying an arm and a leg for. Now, the storage industry’s best kept secret is finally out.

It Doesn’t Stop There

LINBIT is pioneering open source SDS. In development for over 7 years, the new solution will create standard High Availability clusters like described above, and also work perfectly for cloud storage. The LINBIT SDS software introduces performance advantages scalability to the  design. LINBIT’s created a sort of “Operating System based,” Open Source, Software Defined Storage technology that is already built into your existing operating system and ready to use with any Linux system.

The Default Replication Option

LINBIT’s DRBD software receives about 10,000 confirmed downloads per month (people who opt-in to show their statistics). LINBIT is far more engineering and development focused than sales focused so if you aren’t solving a real-world problem you have probably never ran into them. LINBIT’s software popularity is user driven, and due to 3 main reasons:

Flexibility: Since the DRBD software replicates data at the block level, it works with any filesystem, VM, or application that writes data to a hard drive. It can replicate multiple resources simultaneously so users don’t have to choose different replication technologies for every application/database running on the server.

Stability: Being accepted into the mainline Linux kernel is a very stringent process. DRBD has been in the kernel since 2009, version 2.6.33

Synchronous: Prior to DRBD’s availability (no pun intended), the only option for synchronous replication was hardware (SAN, NAS devices). The DRBD software can run in synchronous or asynchronous mode, and be used for local replication or Geo Clustering across long distances.

Now that DRBD has tools to provision your storage, scaling out has never been easier. Interested in how this might apply for your projects? Check out some of LINBIT’s  (free) innovative technical documents which describe how to set up a cluster for your specific environment. Have an idea that isn’t covered in the documentation? Reach out to [email protected] and ask if your idea is sane. They’ll consult the LINBIT engineering team, and will point you in the right direction. Most importantly, NEVER settle for unplanned downtime.

Find out more about the costs of downtime in the podcast, The OrionX Download with LINBIT CEO, Brian Hellman.

DRBD and Randtronics DPM

Today we’re happy to announce a new document titled “Block Replication with Filesystem Encryption” which showcases another wonderful use case for DRBD.

Block Replication with Filesystem Encryption

At Hosting Con, back in April of this year, some colleagues of mine ran into some representatives from Randtronics. Randtronics is the company responsible for the DPM (Data Privacy Management) software suite. This software suite provides file encryption, user management, ACLs, and more. I could imagine this software would prove useful to those in fields where data privacy is an absolute must. Fields such as the medical, legal, human resources, or intellectual property, quickly come to mind.

(Graphic is property of Randtronics)

After a brief discussion with us regarding just how versatile DRBD can be it was decided to see if perhaps DRBD could work seamlessly with DPM. Randtronic’s DPM can help protect your data from prying eyes, or those who may wish to steal it, but can it protect your data from system failures? When teamed up with DRBD you can be assured that your data is both secure and available.

I worked briefly with Gary Lansdown of Randtronics to introduce him to asciidoc, but I must give credit to Randtronics for this document.

Secure Linux: Atomicorp includes DRBD for replication

Every so often we get a chance to test new¹ software. Usually this opportunity is driven by the question: Does DRBD play nicely with it?

At HostingCon this year, we met a team from Atomicorp and decided that it would be interesting to see if we could get DRBD running on this hardened version of Linux. Overall, LINBIT’s broad client-base loosly includes “security” since “Availability” is one of the 3 Security pillars of the CIA triad.

 

Image Source: Panmore Institute

Security certainly fits with Atomicorp since they focus on clients in the federal, financial, healthcare, and hosting space. Their HQ is based in the same business park as Raytheon, Boeing, and Booz Allen Hamilton, if that tells you anything about their market.

We frequently take on the challenge of seeing if we can get DRBD compiled and working correctly, like that time we installed it on 2 raspberry pi’s, and this case was no different. While we were confident that there wouldn’t be issues with installation, — after all, it’s Linux — we needed to verify compatibility with the ASL (Atomic Secured Linux™) hardened kernel before announcing that it works.

After speaking with the Atomicorp team, they let us know that some of their clients were already running DRBD and Pacemaker for High Availability within their data centers. That’s great news! We anticipated that the testing would go quickly since we already had verified users.

Upon installing DRBD on a pair of RHEL 7 systems, we found something unexpected. DRBD is already included in the ASL kernel. This means Atomicorp is hardening and packaging a newer mainline kernel instead of hardening that which the distribution supplies. Nice work Atomicorp! The DRBD 8.4.5 version in the ASL kernel is pretty recent too.

It’s funny. Clients often ask us if we have seen DRBD used for their specific use case. DRBD is so versatile that we’re not always familiar with every situation. If we had been asked if anyone was using DRBD with Atomicorp’s ASL product, we would have said “I don’t know.” The irony here is that when you install the ASL hardened kernel, you may automatically get DRBD on a distribution where you otherwise may have not. It is available for everyone who runs Atomicorp’s ASL kernel whether the end user leverages the replication functionality or not².

This isn’t just a fun, internal office story; this is the essence of how Open Source Software works. We now know that there is a connection between ASL and DRBD, and are delighted to work with Atomicorp moving forward. It just makes sense since end-clients of both Atomicorp and LINBIT achieve feature-sets that they wouldn’t have otherwise. Altogether, our partners help advocate for our open source software and when our solutions are combined, everyone keeps inching toward bigger and better solutions, while maintaining focus on their core competencies.

So does the DRBD software work with Atomicorp and the Atomic Secured Linux™ kernel? Of course it does; and now, for the next few weeks, I get to be mocked by my coworkers for having our engineers test something which already had our software baked into it. 😉

 

1: New to us.
2: You’ll still need the userland utilities to manage and initialize DRBD, but that’s less of security concern than compiling and inserting a kernel module.

Dreaded Day of Downtime

Some say that no one dreads a day of downtime like a storage admin.

I disagree. Sure, the storage admins might be responsible for recovering a whole organization if an outage occurs; and sure, they might be the ones who lose their jobs from an unexpected debacle, but I would speculate that others have more to lose.

First, the company’s reputation takes a big, possibly irreparable hit with both clients and  employees. Damage control usually lasts far longer than the original outage.  Take the United Airlines case from earlier in 2017 when a computer malfunction led to the grounding of all domestic flights. Airports across the country were forced to tweet out messages about the technical issues after receiving an overwhelming number of complaints. Outages such as this one can take months or years to repair the trust with your customers. Depending upon the criticality of the services, a company could go bankrupt. Despite all this, even the company isn’t the biggest loser; it is the end-user: and that is what the rest of this post will focus on.

Let’s say you’re a senior in college. It’s spring term, and graduation is just one week away.  Your school has an online system to submit assignments which are due at midnight, the day before finals week. Like most students at the school, you log into the online assignment submission module, just like you have always done.  Except this time, you get a spinning wheel. Nothing will load. It must be your internet connection. You call a friend to have them submit your papers, but she can’t login either. The culprit: the system is down.

Now, it’s 10:00 PM and you need to submit your math assignment before midnight. At 11:00 PM you start to panic. You can’t log-in and neither can your classmates.  Everyone is scrambling. You send a hastily written email to your professor explaining the issue. She is unforgiving because you shouldn’t have procrastinated in the first place. At 1:00 AM, you refresh the system and everything is working (slowly), but the deadlines have passed. The system won’t let you submit anything. Your heart sinks as you realize that without that project, you will fail your math class and not be able to graduate.

This system outage caused heartache, stress and uncertainty for the students and teachers along with a whole lot of pain for the administrators.  The kicker is that the downtime happened when traffic was anticipated to be the highest! Of course, the servers are going to be overloaded during the last week of Spring term. Yet, notoriously, the University will send an email stating that it experienced higher than expected loads; and that ultimately, they weren’t prepared for it.

During this time, traffic was 15 times its normal usage, and the Hypervisor hosting the NFS server and the file sharing system was flooded with requests.  It blew a fan and eventually overheated. Sure, the data was still safe inside the SAN on the backend.  However, none of that mattered when the students couldn’t access the data until the admin rebuilt the Hypervisor. By the time the server was back up and running, the damage was done.

High Availability isn’t a simple concept but it is critical for your organization, your credibility, and even more importantly, for your end-users or customers. In today’s world, the bar for “uptime” is monstrously high therefore downtime is simply unacceptable.

If you’re a student, an admin or a simple system user- I have a question for you (and don’t just think about yourself, think about your boss, colleagues, and clients):

What would your day look like if your services went unresponsive right… NOW?!
Learn more about the costs and drivers of data loss, and how to avoid it, by reading the paper from OrionX Research.

Would you want to be your own car mechanic?

Data seems to be on everyone’s mind these days.  From employee to financial data, your company has to keep it available through seamless replication — without downtime. LINBIT DRBD is the open source software that ensures High Availability for your enterprise.

Read more

Persistent and Replicated Docker Volumes with DRBD9 and DRBD Manage

Nowadays, Docker has support for plugins; for LINBIT, volume plugins are certainly the most interesting feature. Volume plugins open the way for storing content residing in usual Docker volumes on DRBD backed storage.

In this blog post we show a simple example of using our new Docker volume plugin to create a WordPress powered blog with a MariaDB database, where both the content of the blog and the database is replicated among two cluster nodes. Read more

Testing SSD Drives with DRBD: Intel DC 3700 Series

Over the next few weeks we’ll be posting results from tests that we’ve run against various manufactures SSD drives; including Intel, SanDisk, and Micron, to name a few.

The first post in this series goes over our findings of the Intel DC S 3700 Series 800GB SATA SSD drives. Read more

Change the cluster distribution without downtime

Recently we’ve upgraded one of our virtualization clusters (more RAM), and in the course of this did an upgrade of the virtualization hosts from Ubuntu Lucid to RHEL 6.3 — without any service interruption. Read more

Backup ideas: using a double-stacked setup

Have you ever wanted to do a file based backup of your data without impacting
your application, and without stopping your HA replication? Here is one
possible method. Read more

Mirrored SAN vs. DRBD

Every now and then we get asked “why not simply use a mirrored SAN instead of DRBD”? This post shows some important differences. Read more