Incremental tar backups

GNU Tar has a reputation for being slightly awkward to use. Despite this, tar is one of the most frequently used archive commands, and is installed by default on most Linux distributions.

A very common use for tar is creating regular backups. However if you backup a directory which changes infrequently, this can lead to a large number of duplicate files being stored. This post is going to look at using incremental tar files to get around this problem.

A simple tar backup

A very simple backup script might look something like the following:

#!/bin/sh
#
# Create a daily backup of /home/someuser
#
set -e
PATH='/sbin:/bin:/usr/sbin:/usr/bin'

[ -d /var/home_backups ] || mkdir --mode=0700 /var/home_backups

tar -cjf "$(date +'/var/home_backups/%Y-%m-%d_someuser_home.tar.bz2')" /home/someuser

Each time the script above is run, a full date stamped tar archive will be created. If the script is run daily (e.g. via cron), /var/home_backups will look something like the following after a few days:

+-- /var/home_backups
    +-- 2018-01-21_someuser_home.tar.bz2
    +-- 2018-01-22_someuser_home.tar.bz2
    +-- 2018-01-23_someuser_home.tar.bz2
    +-- 2018-01-24_someuser_home.tar.bz2
    +-- 2018-01-25_someuser_home.tar.bz2
    +-- 2018-01-26_someuser_home.tar.bz2

Unless the contents of /home/someuser changes rapidly, each archive will contain duplicate files.

Using incremental tar files

Instead of always using full backups, tar can create incremental backups. This involves two steps:

  1. First create a full backup and a snapshot file:

    tar --listed-incremental full_backup.snar -cjf full_backup.tar.bz2 /home/someuser
    

    This will create two files, a full backup of the directory (full_backup.tar.bz2), and a snapshot file (full_backup.snar). The snapshot file will contain timestamps and file metadata. The snapshot file format is described in the tar docs.

  2. Subsequent backups can then use the snapshot files to create an incremental tar archive which will skip unmodified files:

    cp full_backup.snar increment.snar
    tar --listed-incremental increment.snar -cjf incremental_backup.tar.bz2 /home/someuser
    rm increment.snar
    

    Note: by default the snapshot file is overwritten when an incremental backup is created. If you want to create multiple incremental backups from the base archive, make sure you make a copy of the snapshot file.

Using the steps above, the example script can be updated to something similar to the following:

#!/bin/sh
#
# Create a daily backup of /home/someuser
#
set -e
PATH='/sbin:/bin:/usr/sbin:/usr/bin'

[ -d /var/home_backups ] || mkdir --mode=0700 /var/home_backups

if [ -f /var/home_backups/full_backup.snar ]; then
  snapshot_copy="$(mktemp)"
  cp /var/home_backups/full_backup.snar "$snapshot_copy"
  tar --listed-incremental "$snapshot_copy" \
    -cjf "$(date +'/var/home_backups/%Y-%m-%d_someuser_home.tar.bz2')" /home/someuser
  rm "$snapshot_copy"
else
  tar --listed-incremental /var/home_backups/full_backup.snar \
      -cjf /var/home_backups/full_backup.tar.bz2 /home/someuser
fi

This script will initially create a full backup when it is first run. Subsequent backups will then be incremental backups based on the initial full backup:

+-- /var/home_backups
    +-- 2018-01-21_someuser_home.tar.bz2
    +-- 2018-01-21_someuser_home.tar.bz2
    +-- 2018-01-22_someuser_home.tar.bz2
    +-- 2018-01-23_someuser_home.tar.bz2
    +-- 2018-01-24_someuser_home.tar.bz2
    +-- 2018-01-25_someuser_home.tar.bz2
    +-- 2018-01-26_someuser_home.tar.bz2
    +-- full_backup.snar
    +-- full_backup.tar.bz2

Restoring incremental backups

To restore an incremental backup, the base archive needs to be extracted, followed by the incremental backup:

tar --listed-incremental /dev/null -xf /var/home_backups/full_backup.tar.bz2
tar --listed-incremental /dev/null -xf /var/home_backups/2018-01-25_someuser_home.tar.bz2

The --listed-incremental option is required to ensure files deleted before the final incremental file was created, are not present in the restored folder. To achieve this tar stores additional metadata for directories in each archive. The metadata can be viewed by using two verbose (-v) options:

$ tar --incremental -tvvf /var/home_backups/2018-01-26_someuser_home.tar.bz2
drwx------ someuser/someuser 71 2018-01-26 22:36 home/someuser/
N .bash_logout
N .bash_profile
N .bashrc
Y file.0
Y file.2
N file.5
Y wibble

-rw-r--r-- someuser/someuser 1048576 2018-01-26 22:36 home/someuser/file.0
-rw-r--r-- someuser/someuser 1048576 2018-01-26 22:36 home/someuser/file.2
-rw-r--r-- someuser/someuser      94 2018-01-26 22:36 home/someuser/wibble

Note: the snapshot file is not required to restore incremental backup files because the metadata embedded in the archive is sufficient.