Posts

Dreaded Day of Downtime

Some say that no one dreads a day of downtime like a storage admin.

I disagree. Sure, the storage admins might be responsible for recovering a whole organization if an outage occurs; and sure, they might be the ones who lose their jobs from an unexpected debacle, but I would speculate that others have more to lose.

First, the company’s reputation takes a big, possibly irreparable hit with both clients and  employees. Damage control usually lasts far longer than the original outage.  Take the United Airlines case from earlier in 2017 when a computer malfunction led to the grounding of all domestic flights. Airports across the country were forced to tweet out messages about the technical issues after receiving an overwhelming number of complaints. Outages such as this one can take months or years to repair the trust with your customers. Depending upon the criticality of the services, a company could go bankrupt. Despite all this, even the company isn’t the biggest loser; it is the end-user: and that is what the rest of this post will focus on.

Let’s say you’re a senior in college. It’s spring term, and graduation is just one week away.  Your school has an online system to submit assignments which are due at midnight, the day before finals week. Like most students at the school, you log into the online assignment submission module, just like you have always done.  Except this time, you get a spinning wheel. Nothing will load. It must be your internet connection. You call a friend to have them submit your papers, but she can’t login either. The culprit: the system is down.

Now, it’s 10:00 PM and you need to submit your math assignment before midnight. At 11:00 PM you start to panic. You can’t log-in and neither can your classmates.  Everyone is scrambling. You send a hastily written email to your professor explaining the issue. She is unforgiving because you shouldn’t have procrastinated in the first place. At 1:00 AM, you refresh the system and everything is working (slowly), but the deadlines have passed. The system won’t let you submit anything. Your heart sinks as you realize that without that project, you will fail your math class and not be able to graduate.

This system outage caused heartache, stress and uncertainty for the students and teachers along with a whole lot of pain for the administrators.  The kicker is that the downtime happened when traffic was anticipated to be the highest! Of course, the servers are going to be overloaded during the last week of Spring term. Yet, notoriously, the University will send an email stating that it experienced higher than expected loads; and that ultimately, they weren’t prepared for it.

During this time, traffic was 15 times its normal usage, and the Hypervisor hosting the NFS server and the file sharing system was flooded with requests.  It blew a fan and eventually overheated. Sure, the data was still safe inside the SAN on the backend.  However, none of that mattered when the students couldn’t access the data until the admin rebuilt the Hypervisor. By the time the server was back up and running, the damage was done.

High Availability isn’t a simple concept but it is critical for your organization, your credibility, and even more importantly, for your end-users or customers. In today’s world, the bar for “uptime” is monstrously high therefore downtime is simply unacceptable.

If you’re a student, an admin or a simple system user- I have a question for you (and don’t just think about yourself, think about your boss, colleagues, and clients):

What would your day look like if your services went unresponsive right… NOW?!
Learn more about the costs and drivers of data loss, and how to avoid it, by reading the paper from OrionX Research.

 

Greg Eckert on Linkedin
Greg Eckert
In his role as the Director of Business Development for LINBIT America and Australia, Greg is responsible for building international relations, both in terms of technology and business collaboration. Since 2013, Greg has connected potential technology partners, collaborated with businesses in new territories, and explored opportunities for new joint ventures.

DRBD 9 over RDMA with Micron SSDs

We have been testing out some 240GB Micron M500DC SSDs with DRBD 9 and DRBD’s RDMA Transport layer.  Micron, based in Boise Idaho, is a leader in NAND, flash production and storage.  We found that that their M500DC SSD’s are write optimized for data center use cases and in some cases exceeded the expected performance.

Read more

Having Fun with the DRBD Manage Control Volume

DRBDmanage has been replaced by LINSTOR !

To find out more about LINSTOR, check out the following blog articles:

  1. Cluster-wide management of replicated storage with LINSTOR
  2. The Technology inside LINSTOR (Part 1)
  3. The Technology inside LINSTOR (Part 2)
  4. How to setup LINSTOR on Proxmox VE

Thank you and have a good read!

 

As you might know, DRBD Manage is a tool that is used in the DRBD9 stack to manage (create, remove, snapshot) DRBD resources in a multi-node DRBD cluster. DRBD Manage stores the cluster information in the so called Control Volume. The control volume is a DRBD9 resource itself which is then replicated across the whole cluster. This means that the control volume itself is just a block device, like all the regular DRBD resources. Read more

DRBD and SSD: I was made for loving you

When DRBD 8.4.4 integrated TRIM/Discard support, a lot of things got much better… for example, 700MB/sec over a 1GBit/sec connection. Read more

DRBDManage release 0.10

DRBDmanage has been replaced by LINSTOR !

To find out more about LINSTOR, check out the following blog articles:

  1. Cluster-wide management of replicated storage with LINSTOR
  2. The Technology inside LINSTOR (Part 1)
  3. The Technology inside LINSTOR (Part 2)
  4. How to setup LINSTOR on Proxmox VE

Thank you and have a good read!

 

As already announced in another blog post, we’re preparing a new tool to simplify DRBD administration. Now we’re publishing its first release! Read more

DRBD-Manager

DRBDmanage has been replaced by LINSTOR !

To find out more about LINSTOR, check out the following blog articles:

  1. Cluster-wide management of replicated storage with LINSTOR
  2. The Technology inside LINSTOR (Part 1)
  3. The Technology inside LINSTOR (Part 2)
  4. How to setup LINSTOR on Proxmox VE

Thank you and have a good read!

 

One of the projects that LINBIT will publish soon[1. With an Open Source license; GIT will be the preferred way to help us.] is drbdmanage, which allows easy cluster-wide storage administration with DRBD 9. Read more

“umount is too slow”

A question we see over and over again is

Why is umount so slow? Why does it take so long?

Part of the answer was already given in an earlier blog post; here’s some more explanation. Read more

Backup ideas: using a double-stacked setup

Have you ever wanted to do a file based backup of your data without impacting
your application, and without stopping your HA replication? Here is one
possible method. Read more

Mirrored SAN vs. DRBD

Every now and then we get asked “why not simply use a mirrored SAN instead of DRBD”? This post shows some important differences. Read more