Backing up a Server to Amazon S3

Motivation

When deploying a server on the internet you always have to deal with security issues. You harden your server by setting up encrypted connections, configuring a tight firewall and putting critical services in a chroot jail. However what happens if an intruder hacks into your server and deletes your content? Or if you make a mistake and erase some data? The last line of defense is having a good backup strategy.

The question arises what do backup and where to back it up. For me something like to holy grail would be having a fast, reliable, file-system-based backup solution like the snapshot feature in ZFS. It should then be possible to sync these snapshots in a bandwidth-efficient manner to a remote location.

Using Duplicity for Backup

There are a lot of different solutions around, however they differ in security, price and reliability. Duplicity is one of them. It is relatively easy to set up if you have a simple backup problem, such as backing up the web server and its corresponding database. The neat thing about Duplicity is that it can make encrypted, incremental backups using standard file formats. For the incremental part of this operation it relies on rdiff to do the heavy lifting and it is using GPG to encrypt the backup with a public/private key pair. One of the other benefits is that Duplicity offers out of the box Amazon S3 support. This means that you are able to store your backups up in the cloud in a save manner. By performing incremental backups only, the costs for traffic and storage are minimized. In my case which is performing a daily backup of the configuration and the blog of the server I never paid anything because there is a minimum billing amount per month.

Using a file server in the cloud as the backup destination has its benefits especially in the restore case. Then you can rely on Amazon’s bandwidth to perform a fast restore instead of using your home cable or DSL connection.

What is happening during a backup?

In the beginning, when there are no previous backups to be used for incremental backups, Duplicity is performing a full backup. For its backup files it uses standard tar file format which then gets encrypted using GPG and your public key. It is then uploaded to a remote server.

The next backup is an incremental backup. This means that duplicity now first checks if its local cache of previous diffs is up to date with the remote repository. If that is not the case it downloads all the previous diffs because it needs them to generate the new diff of the most recent changes. It then calculates the diff, encrypts it and uploads it together with a hash of the diff to the remote server.

Backup a Database

When using a web application often there are not only files, but also databases to be backed up to be able to fully restore your web page from the backup. In my case I am using a MySQL server. The way I am doing it is first performing a mysqldump and then backing this file up. I have written a small helper script in perl that is kicked off by a cron job.

I use for quite a while now and it is working very nicely. As a starting point I published the backup wrapper script on github.

Thomas

Chemist, Programmer, Mac and iPhone enthusiast. Likes coding in Python, Objective-C and other languages.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.