Automatic macOS Scheduled Backups

In which I build a custom backup solution with rsync and a LaunchDaemon

I use a Mac Mini as a file server and hypervisor. This very site is running in a Docker container inside VirtualBox on the aforementioned Mac Mini.

There's a 6tb drive mounted to the Mac Mini on which many "important" files live. In the past I periodically copied them over, but computers are supposed to do this stuff automatically. That's like the whole point or whatever.

The tools we'll use are rsync and launchd. rsync is the classic tool commonly found on Unix-like operating systems, used for synchronizing files between two locations. It's robust, powerful, and popular. launchd in macOS is a framework that can can manage daemons/processes/applications/etc. We'll use launchd in a manner similar to cron.

First, let's get rsync going. We have three physical volumes:

  • /Volumes/Shared/ - Where all the shared stuff is stored, a 3tb drive
  • /Volumes/Media/ - Where a bunch of media is stored, a 1tb drive
  • /Volumes/Backup/ - Our backup destination, a 6tb drive

To make a very simple backup, you could simply copy everything like this:

# cp -r /Volumes/Shared/ /Volumes/Backup/

but this kinda sucks because ALL files are copied which takes forever and will fail if you sneeze too hard.

Enter rsync. With rsync, we have a lot of interesting options. There are many other websites that probably explain rsync much better, so we'll just worry about what we need for our purposes.

Starting with pseudocode is helpful when doing anything with computers aside from checking your social media accounts for trash memes. Here's what we want to accomplish with our tools:

Every day at 3am:

  • Copy all files and folders from /Volumes/Shared/ to /Volumes/Backup/...
    • Preserving all file ownerships and permissions
    • Skipping anything that already exists on /Volumes/Backup/
    • Skipping hidden files and folders
    • Deleting anything from /Volumes/Backup/ that doesn't exist on /Volumes/Shared/
    • Continuing if we stumble across any strange errors that could stop the rsync

The first sync is going to take forever, but subsequent syncs will be much faster because most of the files will already exist on the destination - We're only syncing the things that have changed since the last sync.

You might be thinking "Isn't this basically Dropbox or Time Machine?" Yup, you're right it's very similar. However, Dropbox costs cash money per GB and I have a ton of data to backup. Also, I can't use Time Machine because that volume is already being used as a Time Machine destination for our laptops and I got some weird error about the wrong filesystem when I tried to enable it. I couldn't figure out how to enable Time Machine for this server on its own attached external volume, while also sharing it on the network.

Anyway here's the rsync commands we'll use for our backups:

# rsync -arv /Volumes/Media/ /Volumes/Backup/Media/ --exclude=".*" --delete --ignore-errors
# rsync -arv /Volumes/Shared/ /Volumes/Backup/Shared/ --exclude=".*" --delete --ignore-errors

To break down the options we need:
-a: Archive: Preserve permissions, ownership, etc
-r: Recursive: Also get all the subfolders and their content
-v: Verbose: Get more verbose output
--exclude=".*": macOS uses hidden files for things like Spotlight, Trash, Time Machine, and a bunch of other crap. We don't need to back those up since macOS creates them as needed. Also we don't really have permission cos I think ths system owns it all. Hidden filenames and folder names begin with a "."
--delete: Delete items FROM the destination that DO NOT exist on the source. So If I delete "catvid.mp4" because it sucked, it will also be deleted from the backup.
--ignore-errors: This volume has files over a decade old, so there are bound to be some corrupt files. I just skip over them because I can't open them anyway, and there's no reason to fail the rest of the backup just for a few random errors. I'll clean this volume up eventually I swear.

So that's easy enough - We're just syncing files from the network share to the backup. Buuuuut I also want to backup my the VM that hosts this website. My webserver is typically not very busy, so it might just work to rsync the .vdi and metadata over. However if the VM is doing something, the .vdi is probably being modified, and so the backup will be probably end up corrupted. So here's some psuedocode for what we need to do:

Every day at 3am:

  • If the VM that hosts this website is running...
    • Send the graceful shutdown signal
    • For 5 minutes, check to see if the VM is shutdown yet
    • When the VM reports that it's shutdown, start a backup
    • When the backup is complete, power the VM back on

This does mean that this site will be offline for about 15 minutes while the backup completes (You can check for yourself), but that's ok. The server logs show very little traffic to this site, especially at 3am. There's probably a better way to do this, like instead of saving the entire VM, use Git to grab the important stuff. Maybe I'll build that later.

Here's the entire script. You'll see the vboxmanage command, which is used to control the VM, and a for loop that checks to see if the VM is safely shutdown before starting the sync. The VM is simply called "web":

# Sync Media and Shared
rsync -arv /Volumes/Media/ /Volumes/Backup/server/Media/ --exclude=".*" --delete --ignore-errors
rsync -arv /Volumes/Shared/ /Volumes/Backup/server/Shared/ --exclude=".*" --delete --ignore-errors

# The VM part
# Timeout in seconds. If the machine doesn't shut down within 5 mins, something's wrong, so skip the backup to be safe.
TIMEOUT=300

# Boolean to tell us to backup or not
BACKUP=true

# Safely shutdown web VM with an ACPI call
vboxmanage controlvm web acpipowerbutton

# The loop that will run for 5 minutes
for ((i=TIMEOUT; i>-1; i--))
do
    echo "Waiting $i more seconds for web to shutdown gracefully..."

    # Pause for 1 second
    sleep 1

    # Check the running state of the VM. If it's "poweroff" then exit and and start the backup.
    if [[ $(VBoxManage showvminfo --machinereadable web | grep ^VMState=.poweroff.) ]]; then
        BACKUP=true
        break
    fi

    # If we've met the timeout value, then something's wrong and we should exit and skip the backup.
    if [[ "$i" -eq 0 ]]; then
        BACKUP=false
        break
    fi    
done

if [[ "$BACKUP" == true ]]; then
    # Start the backup
    mkdir -p /Volumes/Backup/server/Storage/VirtualBox\ VMs/web/
    rsync -arv /Volumes/Storage/VirtualBox\ VMs/web/ /Volumes/Backup/server/Storage/VirtualBox\ VMs/web/ --delete --ignore-errors
elif [[ "$BACKUP" == false ]]; then
    echo "Timeout, skipping this backup."
    # Should also figure out how to send an alert email.
fi

# Start the VM back up.
vboxmanage startvm web --type headless


Running this by itself seems to work`! Now, let's make it into a Daemon so it happens automatically..
First, create a .plist file in /Library/LaunchDaemons. Here's a wizard: http://launched.zerowidth.com

In the .plist file I created, you can see that I'm running the /Volumes/Backup/backup.sh script I created, as myself (fred), every day at 3am.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <key>Label</key>
    <string>com.fcm.backup</string>
    <key>ProgramArguments</key>
    <array>
        <string>/Volumes/Backup/backup.sh</string>
    </array>
    <key>UserName</key>
    <string>fred</string>
    <key>StartCalendarInterval</key>
    <dict>
        <key>Hour</key>
        <integer>03</integer>
    </dict>
</dict>
</plist>

Then set appropriate permissions/ownership for this .plist file (the user "fred" is a member of wheel):

# chown root:wheel /Library/LaunchDaemons/com.fcm.backup.plist
# chmod a+x /Library/LaunchDaemons/com.fcm.backup.plist

Next, we need to make the script executable, fix the ownership, and set appropriate permissions:

# chown root:wheel /Volumes/Backup/backup.sh
# chmod a+x /Volumes/Backup/backup.sh

Finally, let's enable the Daemon:

# launchctl -w load /Library/LaunchDaemons/com.fcm.backup.plist

That's it, we now have an automated task to run a backup script every day at 3am that is free and won't break our VM!

This is a decent solution for static files like media, but for this website, this is kind of a caveman way to do a backup as it captures the entire VM. All we really care about is the site data, If the VM exploded, it would be better and faster to rebuild the VM and container, and restore the site data. Maybe a post on that later.

Great commerical options do exist for backups such as Carbon Copy Cloner, SuperDuper, Time Machine, and Dropbox. You could certainly use one of those if you wanted, but I really like the flexibility that comes with doing it manually like this.

References:
https://www.createdbypete.com/a-practical-guide-to-using-rsync/
https://ss64.com/bash/rsync.html
https://www.launchd.info
https://davidhamann.de/2018/03/13/setting-up-a-launchagent-macos-cron/

Written by fred on Friday July 5, 2019
Permalink -

« Dumpster TV Repair -