Decrypting Android backups with Python

Following on from last weeks post, this post is going to look in detail at how to unpack and decrypt Android backups with Python. The focus is primarily on the file format, and how it's processed. If you're only interested in unpacking backups, I would recommend having a look at Android backup extractor which was covered in the previous post.

Header format

The first four lines of an Android backup file will look similar to the following:

ANDROID BACKUP
4
1
AES-256

The first line is a "magic" string used to identify the file as an Android backup. The next two lines specify the file format version and compression version. Finally the last line indicates the type of encryption being used; at the time of writing none or AES-256 are the only valid options. Parsing the first few lines is just a case of reading the file line by line with code similar to the following:

header = {}
with open(backup_file, 'rb') as backup:
    if backup.readline() != b'ANDROID BACKUP\n':
        raise AndroidBackupParseError('Unrecognised file format!')

    header['format_version'] = int(backup.readline())
    header['compression_version'] = int(backup.readline())
    header['encryption'] = backup.readline().decode('utf-8').strip()

If the file is encrypted there will be five additional lines similar to the following:

A565DBC120C063F9...
E14901585B953A94...
10000
8B79E3E53050B873...
69EE0E7EC88799B3...

These lines contain the following information:

  1. The user password salt (hex)
  2. The master key checksum salt (hex)
  3. The number of PBKDF2 rounds used with HMAC
  4. The IV of the user key (hex)
  5. A blob encrypted with the user key containing the IV of the master key, master key itself, and a master key checksum hash (hex)

Parsing these lines is just a case of continuing to read the header line by line, and converting the hex to bytes where appropriate:

header['user_salt'] = bytes.fromhex(backup.readline().decode('utf-8').strip())
header['checksum_salt'] = bytes.fromhex(backup.readline().decode('utf-8').strip())
header['pbkdf2_rounds'] = int(backup.readline())
header['user_iv'] = bytes.fromhex(backup.readline().decode('utf-8').strip())
header['master_key_blob'] = bytes.fromhex(backup.readline().decode('utf-8').strip())

Note: the BackupManagerService source code is worth looking at for additional header information.

Decrypting the master key

Encrypted Android backups are encrypted using AES 256. There are in fact two keys which are used; a "master" key which encrypts the actual data, and a "user" key which is used to encrypt the master key. The first step to decrypting the master key is generating the user key using HMAC. The process converts an Android backup password and a user salt into a 256 bit key. This can be done with code similar to the following:

PBKDF2_KEY_SIZE = 32

key = hashlib.pbkdf2_hmac('sha1', password.encode('utf-8'),
                          header['user_salt'],
                          header['pbkdf2_rounds'], PBKDF2_KEY_SIZE)

Once you have the key, the key and user IV and be used to decrypt the master key blob. Unfortunately the Python standard library doesn't have an AES module, there are however third party modules such as pyaes which can be used:

cipher_text = master_key_blob

aes = pyaes.AESModeOfOperationCBC(key, user_iv)

plain_text = b''
while len(plain_text) < len(cipher_text):
    offset = len(plain_text)
    plain_text += aes.decrypt(cipher_text[offset:(offset + 16)])

The decrypted master key blob can then be parsed with code similar to the following:

blob = io.BytesIO(plain_text)
master_iv_length = ord(blob.read(1))
master_iv = blob.read(master_iv_length)
master_key_length = ord(blob.read(1))
master_key = blob.read(master_key_length)
master_key_checksum_length = ord(blob.read(1))
master_key_checksum = blob.read(master_key_checksum_length)

Verifying the checksum

Once the master key has been decrypted, HMAC can be used to regenerate the checksum and verify the key is correct. This can be done with code similar to the following:

checksum = hashlib.pbkdf2_hmac('sha1', master_key, header['checksum_salt'],
                               header['pbkdf2_rounds'], PBKDF2_KEY_SIZE)

if not header['master_key_checksum'] == checksum:
    raise AndroidBackupParseError('Invalid decryption password')

Unfortunately this is only true for earlier Android backup versions. From version 2 onwards the key is converted to a "UTF-8 byte array" before being passed to HMAC to calculate the MAC. The Java Bouncy Castle library has a function called Strings.toUTF8ByteArray which does this conversion. A Python function similar to the following can be used to convert the key in a similar way before it's passed into the HMAC function:

def convert(input_bytes):
    output = []
    for byte in input_bytes:
        if byte < ord(b'\x80'):
            output.append(byte)
        else:
            output.append(ord('\xef') | (byte >> 12))
            output.append(ord('\xbc') | ((byte >> 6) & ord('\x3f')))
            output.append(ord('\x80') | (byte & ord('\x3f')))
    return bytes(output)

Note: Python doesn't have a char type, and treats bytes as unsigned, unlike Java which treats bytes as signed. As a result the function above differs from Strings.toUTF8ByteArray.

Decrypting the payload

Assuming the master key and master key IV have been successfully decrypted, the last step is to decrypt and then decompress the payload. This can done with code similar to the following:

decrypter = pyaes.Decrypter(pyaes.AESModeOfOperationCBC(header['master_key'],
                                                        header['master_iv']))
data = decrypter.feed(backup.read()) + decrypter.feed()
tar_data = zlib.decompress(data)

with open('output.tar', 'wb') as output_tar:
    output_tar.write(tar_data)

Note: the code above reads the backup payload into memory. This is fine for small backup files, however it will be a problem for larger backups.

Source code

A proof of concept Python script for the steps above is available on GitHub. However I would still recommend using Android backup extractor instead unless your interested in playing with the Python code.