Backups are great, and they’re not rocket science. I’m writing up how we do backups, not because I think it’s a cool or unique setup (because it’s not), but to highlight how effective a simple solution can be.
We use rsync to take a local copy of whatever is on our web host without wasting bandwidth downloading files that aren’t needed. The layout looks like this:
Our hosting provider is accessible via ssh, and the backup box we use is a Raspberry Pi model B, costing (more or less) 50 AUD to get running.
On the server
On the server, we back up databases with mysqldump. To do this, you need to enter user details into a .my.cnf file, and then something like this will do the trick:
#!/bin/sh
# Remove old dump
rm -f database.sql.gz
# Dump and compress database
mysqldump -h sql.example.com --all-databases > database.sql
gzip database.sql
The above script is called database-dump.sh, and is called from the backup box, to dump the databases to a file before grabbing all the files.
On the backup box
First, a script to get the files. You should use password-less login with ssh-copy-id for this to work non-interactively:
#!/bin/sh
# Update the database dump
ssh user@host.example.com './database-dump.sh'
# Get files
rsync -avz --delete-during user@host.example.com:/home/user .
We save a copy of the files at this date in a dated archive, so we can back-date to find deleted things. At the end of the above script:
mkdir -p archive
now=$(date +"%Y-%m-%d")
tar -czf archive/backup-$now.tar.gz user
There aren’t a huge number of changes to record daily, so we got cron to run the above script weekly on the backup box. Read man crontab for how to do this.
What backup is not
If you think you shouldn’t be doing backups, you’re wrong. The following are not good excuses:
- Trust — Whoever is looking after the data wont lose it.
Our host is pretty good, but their terms of service say they wont be responsible for any data loss. Even providers which have support agreements can make mistakes. You’ll also be able to work faster if you’re not paranoid about any mistake being unrecoverable.
- Expense — It’s a nice idea but not worth it.
It’s dirt cheap, you can learn to do it yourself, and once set up requires virtually no administration. If your organisation can’t afford some kind of backup solution, then it should probably stop using data in any form.
- RAID — I invested money in RAID, so I don’t need backups.
If you accidentally delete something, or notice that some your files have been tampered with, then RAID will not help you. If there is a problem (eg. fire) at the hosting location, then you will be in trouble regardless of disk redundancy.