Empty trees in Git

After using Git for a little while, there is a reasonable chance you will run across the following hash:

4b825dc642cb6eb9a060e54bf8d69288fbee4904

So where does it come from, and why should you care?

Where does the hash come from?

Every git repository, even an empty repository will contain the hash. This can be verified with git show:

$ git show 4b825dc642cb6eb9a060e54bf8d69288fbee4904
tree 4b825dc642cb6eb9a060e54bf8d69288fbee4904

So where does this hash come from? Well internally Git keeps track of a few different object types. The most fundamental object is a blob which represents file content. Blobs are then referenced by tree objects which represent directories and commit objects reference tree objects. The diagram below gives a quick overview of this:

Git object diagram taken from Pro Git.

The diagram above is taken from Pro Git and licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported license.

Note: the Git Internals chapter from Pro Git has more info on objects if you're still curious.

So how does the hash fit in? Well it's actually the hash of an empty tree. This can be verified by creating a object hash for either /dev/null of an empty string:

$ git hash-object -t tree /dev/null
4b825dc642cb6eb9a060e54bf8d69288fbee4904

$ echo -n '' | git hash-object -t tree --stdin
4b825dc642cb6eb9a060e54bf8d69288fbee4904

Using the hash

The empty tree hash is often used with git diff. For example if you wanted to check for whitespace errors in a directory, you could use the --check option and compare HEAD against the empty tree:

$ git diff $(git hash-object -t tree /dev/null) HEAD --check -- po
po/ca.po:7: trailing whitespace.
+# Terminologia i criteris utilitzats
po/ru.po:4: trailing whitespace.
+#

The empty tree hash is also very useful when writing git hooks. A fairly common pattern is to validate new commits before accepting them with code similar to the following:

for changed_file in $(git diff --cached --name-only --diff-filter=ACM HEAD)
do
  if ! validate_file "$changed_file"; then
    echo "Aborting commit"
    exit 1
  fi
done

This works fine if there are previous commits, however the HEAD reference will not exist if no commits have been made. To get around this the empty tree hash can be used when checking the initial commit:

if git rev-parse --verify -q HEAD > /dev/null; then
  against=HEAD
else
  # Initial commit: diff against an empty tree object
  against="$(git hash-object -t tree /dev/null)"
fi

for changed_file in $(git diff --cached --name-only --diff-filter=ACM "$against")
do
  if ! validate_file "$changed_file"; then
    echo "Aborting commit"
    exit 1
  fi
done