Personal Backups with rdiff-backup
What is rdiff-backup
Quoting their web page:
rdiff-backup backs up one directory to another, possibly over a network. The target directory ends up a copy of the source directory, but extra reverse diffs are stored in a special subdirectory of that target directory, so you can still recover files lost some time ago. The idea is to combine the best features of a mirror and an incremental backup. rdiff-backup also preserves subdirectories, hard links, dev files, permissions, uid/gid ownership, and modification times. Also, rdiff-backup can operate in a bandwidth efficient manner over a pipe, like rsync. Thus you can use rdiff-backup and ssh to securely back a hard drive up to a remote location, and only the differences will be transmitted. Finally, rdiff-backup is easy to use and settings have sensical defaults.
What We Are Going To Do
In this article I will not give a tutorial on rdiff-backup, I will just use it as a simple personal backup tool. For this, I could use a mirroring tool, which would be OK, but rdiff-backup is both easy to get and easy to use, and most important, I know how it works ;-) [1]
So, here's the general plan:
Backup your home folder, or some part of it.
Do it efficiently, both in CPU usage and in disk usage.
Keep a history of changed files (and only of changed files). In fact, rdiff-backup will only keep the changed pieces (the "delta") along with the full version.
Do it automatically
Do it onto a folder owned by another user, so a moment of insanity (rm -Rf / as a regular user) or a trojan doesn't destroy the backups.
What This Is Not
This is not a real, or serious backup solution!
We have no disaster recovery strategy
If the system really goes to heaven, the backups do too
If root goes insane, he can wipe all backups
So, why bother? Because this is easy, and not having any backups is worse. And tell me the truth... do you have any backups of your personal folders?
I am of the opinion that if the real solution is hard, it's often a good idea to spend the effort it takes to really solve the problem.
But I am also of the opinion that sometimes, if the real solution is too hard, most people is not going to care enough about the problem to actually solve it. And in those cases, there is room for an almost good enough non-solution.
The Real Stuff
If you want to perform real, serious backups, I recommend you look at Amanda or Bacula or some other of the plethora of backup solutions available, define a policy, and start doing offsite (or at least off-system) backups.
Some are good, some are very good, almost all of them are better than this article's suggestion (in some way at least ;-)
Getting rdiff-backup
In their homepage you can find Fedora RPMs, and binaries for some other OSs or distributions.
I am using Red Hat 9, and compiled it from the .src.rpm without problems, after getting librsync from the same page, also in .src.rpm format.
Backing Up Your Stuff With rdiff-backup
Step 1
As root, create a folder somewhere, where your backups will go. I will use /home/backups in this example, you can use whatever you want, but:
If it's on another disk, it will be faster
It will use a large amount of space
The space thing is like this: The backup will be just as large as whatever you want to backup. Then it will get larger with time, until it becomes, maybe, many times as large, depending on how many older versions you want to keep of your data.
If you are not root, then you can still do it, but the backup will lose some of its properties (for example, a trojan or a mistake would be able to delete it!)
Step 2
Decide what to backup. For example, I want to backup
/home/ralsina/projects
, where all my in-progress stuff is stored,
and /home/ralsina/.kde
, where the configuration (and more) of KDE
is saved.
Step 3
Let's try it. Here's a shell script for my example:
#!/bin/sh # Where the backups go target=/home/backups mkdir -p "$target" # Just put all the folders you want backed here. If they # have strange characters in the name, just put them in # single quotes. If you are not sure, quote them anyway. for folder in /home/ralsina/projects /home/ralsina/.kde do # If the folder being backed up contains ".." # something bad is going on if echo "$folder" | grep .. then continue fi mkdir -p $target/$folder rdiff-backup $folder $target/$folder done chown root.root -R $target chmod o-w -R $target
The purpose of the chown and chmod at the end is to make sure no one can modify the backups. Think of it as storing them in a vault.
If you are backing up stuff from various users, that will be wrong since it's making root own everything.
Here's an alternative:
#!/bin/sh # Where the backups go target=/home/backups mkdir -p "$target" # Just put all the folders you want backed here. If they # have strange characters in the name, just put them in # single quotes. If you are not sure, quote them anyway. for folder in /home/ralsina/projects /home/ralsina/.kde do # If the folder being backed up contains ".." # something bad is going on if echo "$folder" | grep .. then continue fi mkdir -p $target/$folder rdiff-backup $folder $target/$folder done chown root.root $target chmod 700 $target
This version preserves the owners of everything, but it "locks" the entrance to the backup folder. This way, the backups are safe from accidents, but only root can restore stuff... which may be too inconvenient.
You can just remove the chown/chmod commands, and that version will keep backups, but not protect them against the user himself.
So, take your backup script, save it somewhere with a reasonable name like, in my case, /usr/local/bin/ralsina-backup, make it executable [2]
Restoring
To get back the last version of a file from the backup, you could just copy it from the backup directory.
You can restore both recent and old versions using the --restore-as-of option of rdiff-backup.
For example:
rdiff-backup --restore-as-of (time) /home/backups/home/ralsina/projects/doc.txt
Will restore the /home/ralsina/projects/doc.txt file from moment (time), where (time) has this format (quoting the riff-backup docs):
the string "now" (refers to the current time)
a sequences of digits, like "123456890" (indicating the time in seconds after the epoch)
A string like "2002-01-25T07:00:00+02:00" in datetime format
An interval, which is a number followed by one of the characters s, m, h, D, W, M, or Y (indicating seconds, minutes, hours, days, weeks, months, or years respectively), or a series of such pairs. In this case the string refers to the time that preceded the current time by the length of the interval. For instance, "1h78m" indicates the time that was one hour and 78 minutes ago. The calendar here is unsophisticated: a month is always 30 days, a year is always 365 days, and a day is always 86400 seconds.
A date format of the form YYYY/MM/DD, YYYY-MM-DD, MM/DD/YYYY, or MM/DD/YYYY, which indicates midnight on the day in question, relative to the current timezone settings. For instance, "2002/3/5", "03-05-2002", and "2002-3-05" all mean March 5th, 2002.
A backup session specification which is a non-negative integer followed by 'B'. For instance, '0B' specifies the time of the current mirror, and '3B' specifies the time of the 3rd newest increment.
Limiting the Age of Backups
The scripts given above would keep old versions of files forever. That is not practical, since it would require a huge amount of disk space.
To limit that, you can use the --remove-older-than option.
For example, if we used
It would remove files older than two weeks. The format of the date for removal is the same as the one given above for restoring.
Automating The Process
You can take advantage of cron to make this all run automatically. Just create an entry in root's crontab to do this every night, or whatever.
Here's an example daily backup at midnight:
0 0 * * * /usr/local/bin/ralsina-backup 2>&1 | mail ralsina
This sends a mail to account ralsina in the local box containing the output of the command.
If you are not familiar with cron, you can do it by hand. I suggest you get familiar with it, or use a graphical cron management tool.
Offering Automatic Backups For All Users
A more advanced backup script could read a certain file (say "backupthis") from each user's folder, and get from there the information on what to backup. Here's a quick and dirty version:
#!/bin/sh for homefolder in /home/* do if [ -f $homefolder/backupthis ] then while read $homefolder/backupthis do # If the folder being backed up contains ".." # something bad is going on if echo "$folder" | grep .. then continue fi mkdir -p $target/$homefolder/$folder rdiff-backup $homefolder/$folder $target/$homefolder/$folder done < $homefolder/backupthis fi done
Notice that for this script, if I wanted to backup /home/ralsina/projects
,
in my backupthis
file I would have to put only projects
.
This is intentional, and the goal is that users can only backup stuff from their own home folders.
This script is not meant for arbitrary input. It's not really well written. If a user really really wanted, he could find ways to make it do weird and dangerous stuff ( It's quick and dirty as I said ;-)
Other Possible Uses
You could use almost this exact procedure for keeping two home folders sync'd over the Internet, thus letting you use both computers and have all your data at hand. To do that, read rdiff-backup's information about using it over ssh. However, Unison is probably a better idea.
Final Words
I Hope you find this useful. rdiff-backup is a very cool tool, and has helped me well in a couple of situations. It's very efficient, and simple to use.
However, if you start using these scripts, I strongly suggest you learn more about rdiff-backup from its docs, and its wiki
Hi very nice article