HowTo: Design an offsite backup system

Everyone has experienced the heartbreak of a failed hard drive… that sickening realization that your music, photos, and important documents are gone, victims of statistical likelihood. Luckily, there exists RAID — a system that keeps your data mirrored on two drives so that you aren’t out of luck if one dies. Amanda and I have relied on it for years now, storing all our data on a dedicated file server whose sole purpose is to manage a RAID array and make it available to all the computers on our network.

But, as we ourselves discovered after returning from Christmas vacation, what happens when the file server won’t boot up anymore? We were able to recover our data by connecting our drives to a friend’s machine… but what if the house had burned down? Sure, we had two copies of our data — but both would have been reduced to ashes.

As I ran through these possibilities, a few shortcomings of my reliance on RAID alone came to light:

  • Both drives are equally vulnerable to external circumstances. Having two drives sitting right next to each other means they’ll both be compromised when a disaster occurs. Insurance can buy me new drives, but can’t recreate my data.
  • The RAID controller is a single point of failure. When one drive dies, you can swap in a fresh one — but what happens when the machine that the drives are connected to dies? Worse, if you’re using a hardware RAID implementation, will you be able to find compatible hardware to replace the original system?
  • RAID isn’t a backup. When you delete a file from the RAID array, that file is deleted from both drives. RAID protects you from hard drive failures, but can’t retrieve a file you accidentally deleted last week. (JournalSpace learned this the hard way.)

I came up with several ways to mitigate all three of these issues:

  • Use an online backup system like Mozy or JungleDisk. I’ve heard good things about Mozy, and I liked the transparency of JungleDisk… but a few things didn’t sit right with me. Paying monthly to store my half-terabyte of data is neither attractive nor cheap, for starters. Furthermore, Comcast’s throttled upload bandwidth and limited monthly transfer quota means that pushing my data up to the cloud will be painful — and that getting it back will be similarly difficult. If I have to restore my data, I don’t want it to trickle in drop-by-drop over the course of several months… I want it here now.
  • Use an online backup system for critical data and keep a hard drive offsite with a copy of everything else. Realistically, I don’t have that much data that I couldn’t recreate in case of a disaster… my photos are already archived online at SmugMug, and music can be re-purchased. It seems natural, then, to back up my small amount of non-recoverable data online and come up with my own system to back up everything else. “My own system”, in this case, would mean shuttling a hard drive back-and-forth between home and work, refreshing it weekly with a backup of our RAID array. Here, alas, is the weak link in the plan… what happens when I forget to bring the drive home one week? What happens if the house burns down during that one night when the drive is at home? Do I really want to set up a system that’s going to be such a hassle?
  • Keep a hard drive at work that stays synchronized with our RAID array. In the previous scenario, having an easily-accessible hard drive with a recent backup of all our data is close to ideal, if only I didn’t have to cart it to-and-fro each week. What if I could leave it at work and have it synchronize itself automatically?

The more I thought about the latter option, the more attractive it seemed to me. I have a Mac Mini that could handle the synchronization, an external drive enclosure that could host the backup, and an office where this system could live. Best of all, there’s already a robust, well-known software solution for this task — it’s called rsync.

Here’s how I set it up:

  • Procure an external drive with enough capacity to back up the RAID array. I had a LaCie firewire drive sitting around, so I stuck in a more-efficient Western Digital Green Power drive and made an initial backup of our RAID array. Subsequent backups will only have to transfer new or changed files.
  • Hook up the Mac Mini and external drive at work and allow it to access my RAID array remotely. Mac OS X supports ssh, which allows remote users to securely log in to a computer system. Enabling ssh on my machine at home allowed the Mac Mini to connect to it and access the RAID array. I did some additional work to make this secure:
    • SSH runs on a non-standard port.
    • Incoming connections are only allowed from IP addresses owned by Microsoft.
    • SSH users can only log in under accounts that have strong passwords.
    • The root user can only log in using a certificate, and can only execute the rsync command.
    • The only allowed protocol is SSH2.
  • Figure out the right backup command. I followed these directions to compile the latest version of rsync on my home machine and on the Mac Mini, then referred to several excellent resources to figure out exactly how to use it. Here’s what I ended up with:

    /usr/local/bin/rsync -aNHAXxz --human-readable --max-size=2G --stats --timeout=999 --numeric-ids --protect-args --fileflags --force-change --exclude-from=/raid_backup.excludes --log-file=/var/log/raid_backup.log --rsync-path="/usr/local/bin/rsync --log-file /var/log/raid_backup.log" -e 'ssh -2 -c arcfour -o Compression=no -x -p port_number' root@my_imac:/path/to/RAIDarray/ /path/to/backupdrive/

    Note that I didn’t specify the --delete option, which means that files deleted on the RAID array won’t be removed from the backup drive, allowing me to recover accidentally-deleted files. Eventually I’d like to implement a rotating snapshot system, but for now I’ll manually run a separate rsync command to clean up deleted files when the drive is close to full.

    Since I had extra room on the backup drive, I wrote a similar rsync command to back up the user directories on our home machine as well.

  • Wrap the backup command in a script that sends me notifications. I want to know whether my backup completed successfully, so I gave my Mac Mini a Twitter account, figured out how to send tweets from the command-line, and glued it to my backup command:

    #!/bin/sh

    rm /var/log/raid_backup.log 2>/dev/null
    touch /var/log/raid_backup.log

    RESULT=`/usr/local/bin/rsync -aNHAXxz --human-readable --max-size=2G --stats --timeout=999 --numeric-ids --protect-args --fileflags --force-change --exclude-from=/raid_backup.excludes --log-file=/var/log/raid_backup.log --rsync-path="/usr/local/bin/rsync --log-file /var/log/raid_backup.log" -e 'ssh -2 -c arcfour -o Compression=no -x -p port_number' root@my_imac:/path/to/RAIDarray/ /path/to/backupdrive/`

    EXIT="$?"
    NUMFILES=`echo $RESULT | awk '// {print $9}'`
    RECVD=`echo $RESULT | awk '// {print $52}'`
    /usr/local/bin/twitter "RAID: received $NUMFILES files for $RECVD, exit code $EXIT"

    /usr/local/bin/twitter "`diskutil info /Volumes/Teradrive/ | awk '/Free Space/ {print "Space remaining: " $3 " " $4}'`"

    When run, this script deletes old log files, performs the backup, and Twitters when it’s finished:

  • Schedule the backup script to run nightly. Mac OS X uses launchd to manage periodic processes, so I handed it a configuration file that runs my backup script every day at 2am.

The result? The script wakes up at 2am each morning, looks for new or changed files on the RAID array, copies them over to the backup drive, and tells me all about it via Twitter.

Let’s see how well this does against the problems we identified earlier:

  • Hardened against external circumstances. If something happens to my RAID array at home, the backup drive at work is still safe. (I think I can live with losing both copies of my data in case of a disaster affecting the entire Puget Sound area.)
  • No single point of failure. If the RAID controller dies, I still have a backup at work. If the backup drive or Mac Mini dies, I just replace it with a new one.
  • Deleted files are available for recovery. Since my backup script only syncs added or changed files, I can easily recover from accidental deletes.

That’s it — no monthly fee for storage, disaster recovery is just a trip to work and a Firewire cable, and I got the opportunity to hone my shell-scripting skills. I consider that a success!

2 Comments

  1. Posted January 25, 2009 at 4:19 pm | Permalink | Reply

    Great write-up

  2. Posted January 25, 2009 at 4:49 pm | Permalink | Reply

    Thanks, it’s been stuck in my head for a couple of weeks… needed to get it written out for others to use.

2 Trackbacks

  1. By RAID is not a backup! « UNIX Administratosphere on January 27, 2009 at 2:01 am

    [...] January 2009 This post describes the authors experience, almost losing his data on a RAID disk set. He also gives good [...]

  2. By mysocialbrain: 19-10-2009 : protagonist on October 19, 2009 at 12:10 pm

    [...] offsite backup design interesting read [...]

Post a Comment

Your email is never shared. Required fields are marked *

*
*