Everyone has experienced the heartbreak of a failed hard drive… that sickening realization that your music, photos, and important documents are gone, victims of statistical likelihood. Luckily, there exists RAID — a system that keeps your data mirrored on two drives so that you aren’t out of luck if one dies. Amanda and I have relied on it for years now, storing all our data on a dedicated file server whose sole purpose is to manage a RAID array and make it available to all the computers on our network.
But, as we ourselves discovered after returning from Christmas vacation, what happens when the file server won’t boot up anymore? We were able to recover our data by connecting our drives to a friend’s machine… but what if the house had burned down? Sure, we had two copies of our data — but both would have been reduced to ashes.
As I ran through these possibilities, a few shortcomings of my reliance on RAID alone came to light:
- Both drives are equally vulnerable to external circumstances. Having two drives sitting right next to each other means they’ll both be compromised when a disaster occurs. Insurance can buy me new drives, but can’t recreate my data.
- The RAID controller is a single point of failure. When one drive dies, you can swap in a fresh one — but what happens when the machine that the drives are connected to dies? Worse, if you’re using a hardware RAID implementation, will you be able to find compatible hardware to replace the original system?
- RAID isn’t a backup. When you delete a file from the RAID array, that file is deleted from both drives. RAID protects you from hard drive failures, but can’t retrieve a file you accidentally deleted last week. (JournalSpace learned this the hard way.)
I came up with several ways to mitigate all three of these issues:
- Use an online backup system like Mozy or JungleDisk. I’ve heard good things about Mozy, and I liked the transparency of JungleDisk… but a few things didn’t sit right with me. Paying monthly to store my half-terabyte of data is neither attractive nor cheap, for starters. Furthermore, Comcast’s throttled upload bandwidth and limited monthly transfer quota means that pushing my data up to the cloud will be painful — and that getting it back will be similarly difficult. If I have to restore my data, I don’t want it to trickle in drop-by-drop over the course of several months… I want it here now.
- Use an online backup system for critical data and keep a hard drive offsite with a copy of everything else. Realistically, I don’t have that much data that I couldn’t recreate in case of a disaster… my photos are already archived online at SmugMug, and music can be re-purchased. It seems natural, then, to back up my small amount of non-recoverable data online and come up with my own system to back up everything else. “My own system”, in this case, would mean shuttling a hard drive back-and-forth between home and work, refreshing it weekly with a backup of our RAID array. Here, alas, is the weak link in the plan… what happens when I forget to bring the drive home one week? What happens if the house burns down during that one night when the drive is at home? Do I really want to set up a system that’s going to be such a hassle?
- Keep a hard drive at work that stays synchronized with our RAID array. In the previous scenario, having an easily-accessible hard drive with a recent backup of all our data is close to ideal, if only I didn’t have to cart it to-and-fro each week. What if I could leave it at work and have it synchronize itself automatically?
The more I thought about the latter option, the more attractive it seemed to me. I have a Mac Mini that could handle the synchronization, an external drive enclosure that could host the backup, and an office where this system could live. Best of all, there’s already a robust, well-known software solution for this task — it’s called rsync.
Here’s how I set it up:
The result? The script wakes up at 2am each morning, looks for new or changed files on the RAID array, copies them over to the backup drive, and tells me all about it via Twitter.
Let’s see how well this does against the problems we identified earlier:
- Hardened against external circumstances. If something happens to my RAID array at home, the backup drive at work is still safe. (I think I can live with losing both copies of my data in case of a disaster affecting the entire Puget Sound area.)
- No single point of failure. If the RAID controller dies, I still have a backup at work. If the backup drive or Mac Mini dies, I just replace it with a new one.
- Deleted files are available for recovery. Since my backup script only syncs added or changed files, I can easily recover from accidental deletes.
That’s it — no monthly fee for storage, disaster recovery is just a trip to work and a Firewire cable, and I got the opportunity to hone my shell-scripting skills. I consider that a success!