Like a Hot Knife Through Butter

More or less recently, an interesting line of attacks against software has been revisited, namely Hash-DoS, or, in a nutshell, exploiting weak hash functions used in a hash table implementation to trigger a denial-of-service.

To the best of my knowledge, this problematic has been exposed as early as in 1998 in Phrack by Solar Designer, then variants have been discussed by Crosby and Wallach at USENIX 2003, formally defining algorithmic complexity attacks, by Klink and Wälde during 28c3 in 2011, applying the idea on PHP, Python, Java, Ruby, etc. and more recently by Aumasson, Bernstein and Bosslet (see their slides at Appsec Forum 2012, and their upcoming talk at 29c3), showing that the proposed solutions, essentially randomizing the hash function, were not always as effective as expected.

Technically, this kind of attacks consists in generating a large number of colliding inputs for the hash table. Hence, instead of a having a O(1) average access time to a stored element, one can force the hash table to have a O(n) one. If one is willing to explore all the elements in the hash table, the worst-case complexity becomes O(n^2) instead of O(n). Be able to generate multi-collisions (i.e., multiple inputs, not just two, mapping to the same output), in practice hence depends on the properties of the hash function transforming elements to a key.

In this short post, I’d like to show how hash-DoS can be applied to the btrfs file-system with some astonishing and unexpected success. Btrfs, while still in development stage, is widely considered as being a viable successor of ext4, and an implementation of it is already part of the Linux kernel. According to this page,

Directories are indexed in two different ways. For filename lookup, there is an index comprised of keys:

Directory Objectid BTRFS_DIR_ITEM_KEY 64 bit filename hash

The default directory hash used is crc32c, although other hashes may be added later on. A flags field in the super block will indicate which hash is used for a given FS.

The second directory index is used by readdir to return data in inode number order. This more closely resembles the order of blocks on disk and generally provides better performance for reading data in bulk (backups, copies, etc). Also, it allows fast checking that a given inode is linked into a directory when verifying inode link counts. This index uses an additional set of keys:

Directory Objectid BTRFS_DIR_INDEX_KEY Inode Sequence number

The inode sequence number comes from the directory. It is increased each time a new file or directory is added.

Knowing how trivial it is to compute multi-collisions for a CRC-based hash, I did not resist to play a bit. Roughly speaking, computing a CRC of an n-bit message M consists in interpreting the message as a polynomial M(x) of degree n-1 over \mathrm{GF}(2), dividing it by the CRC defining polynomial P(x), and taking the remainder, hence writing something like \mathrm{CRC} = M(x) - Q(x)\cdot P(x). Obviously, adding any multiple of P(x) to the message M(x) will generate a collision. For the gory details, see this page.

I basically found two different attacks:

  • I computed the time to create 4000 empty files in the same directory whose names were randomly chosen. This takes about 0.2 seconds. The box used is a Fedora distribution within a VM (and btrfs was a loopback-ed device).

    Then, I computed the time to create those 4000 empty files in the same directory, whose names were however chosen in order to hash to the same CRC32C value. This operation fails after 5 (!) seconds and creating only 61 files. In other words, this first attack allows an adversary, in a shared directory scenario, to avoid that a victim creates a file with a known-in-advance name. According to the btrfs maintainer, Chris Mason, 

Collisions are a known issue with any of the hash based directories. […] The impact of the DOS is that a malicious user is able to prevent the creation of specific file names. For this to impact other users on the system, it must be done in directories where both the malicious user and the victim user have permission to create files. The obvious example is /tmp, but there are other cases that may be more site-specific. Group writable directories have other security issues, and so we picked the hash knowing this kind of DOS was possible. It is good practice to avoid the shared directories completely if you’re worried about users doing these kinds of attacks.

  • A bit annoyed by this answer, I tried harder and found the following: I have created several files with random names in a directory (around 500). The time required to remove them is negligible. Then, I have created the same number of files, but giving them only 55 different crc32c values. The time required to remove them is so large that I was not able to figure it out and killed the process after 220 minutes (!). The python script I used is the following, and borrows some code from StalkR:

    More exactly, I mounted a 1GB btrfs file system on a loopback device:

    In the exploit script, just put the variable  hack = False  to generate random empty filenames or  hack = True  to generate colliding filenames and hence trigger the phenomenon. Here is a screenshot of what I obtained:

Given the result, it looks like that playing with collisions is much more likely to trigger an infinite loop than just a complexity increase; at least, the btrfs code surely does not expect to handle many collisions.

Essentially, to thwart both attacks, I would recommend to use a modern lightweight hash algorithm, such as SipHash, instead of CRC32C. Another alternative is to avoid using data structures that have a high worst-case complexity, like hash tables, for storing data that can potentially be manipulated by malicious users. Sacrificing a bit of average-case performance, data structures like red-black trees have a guaranteed search time in O(\log(n)) (I learned this while reading the source code of nginx).

For the record, this vulnerability has been announced to the btrfs maintainer Chris Mason on November 14th, 2012, who acknowledged the bug, but then did not answer any of my e-mails. and mentioned that

 My plan is to get this fixed for the 3.8 merge window. We’re juggling a lot of patches right now but I expect to fix things then.

[UPDATE OF 17/12/2012] As several readers of this post have noticed, and I would like to warmly thank them for their feedback, the second attack does NOT generate an infinite loop within the btrfs code, but merely within the bash expansion code which is responsible to expand the command line rm *. This can be seen in the above screenshot, as the CPU is burnt in userland, and not in the kernel. Hence, what I thought to be a complexity attack against the btrfs file system is actually a (less glamorous) complexity attack against bash.

This said, after having communicated this unfortunate glitch to the btrfs maintainer Chris Mason, he kindly answered me the following:

You’ve found a real problem in btrfs though. Changes since I tested the crc overflow handling have made us not deal with our EOVERFLOW error handling completely right, and there are cases where we force the FS readonly when we shouldn’t. So this is a valid bug, I’m just waiting on some review of my fix, which will get backported to a number of kernels.

To summarize, the message I wanted to pass through this post remains still valid: if one uses a weak hash function, like it is the case in the btrfs file system, one should assume that malicious users can generate zillions of collisions, and one should write accordingly robust code able to handle those collisions in an efficient way.

Another possibility consists in using a lightweight cryptographic hash function that translates the search for multi-collision in a hard task. The security/performance tradeoff to find here is definitely a delicate and hard decision to take.

Finally, the first described attack, i.e., make impossible the creation of a given file within a shared directory, keeps still valid.



  1. I think the main problem is, while BTRFS still being completly broken, it is already offering functionality byond any believe.
    It will take a long, long time, before this large codebase will be considered ready for real use.

  2. If only the B-Tree Filesystem had access to a data structure immune to this… like a B-Tree.

  3. Funny thing is that all these problems and more have already been found, solved and open-sourced in the form of ZFS. it’s time tested and fully production worthy, but (IMHO) due to a myopic and mulish fixation on GPL, remains relatively unavailable to Linux users. I don’t really mind because I’d _much_ rather rely on the likes of Illumos and FreeBSD, but it is nonetheless a shame.

    I can’t remember where I first heard it, but it’s stuck with me: “BTRFS is two years from mainstream adoption and always will be”. To all those out there with Linux filesystem woes I say: “The answer is out there, Neo, and it’s looking for you, and it will find you if you want it to”.

  4. Hi,

    Using a real hash function is definitely better than using a checksum one, but additionnaly one has to use a secret salt in order to prevent DoS by collision attack (hash flood). Perhaps a per filesystem salt is enough, perhaps a per directory salt might be better.

    Ext3 (and thus Ext4) seems to be using a per filesystem salt known as “Directory Hash Seed” and can use various hash function, see “Default directory hash”. Those can be checked with tune2fs -l /dev/

  5. And today’s useless use of cat award goes to…

    A simple “grep ‘hack =’ FILENAME’ would’ve done it 😉

    • @Jan: Shame on me, I’ll surely burn in *nix hell on December 21st, 2012 !

  6. @Cliff, I’m puzzled by “myopic and mulish”. What set of people do you believe need to change their mind/attitude, or what action do you think needs to take place, for ZFS to be acceptable into the Linux kernel?

  7. I strongly suspect this is not a BTRFS issue at all. Perhaps you have tried comparing results with other file systems as well? FWIW, I ran on EXT4, XFS and BTRFS formatted partitions and had *no* issues in removing the files. On one trial run with hash=True and ITERATIONS=1000 (i.e.55000 files) , I even got more files on BTRFS (54268) than on EXT4.Removing them took less 5 seconds. All this was done on a openSUSE 12.1 machine.

    As for the infinite loop theory, rm -f* seems to stall in your case perhaps due to the Bash expansion of the ‘*’. Try removing the files using xargs (“find . -print0 |xargs -0 rm” )

    • @Ravi: EXT4 and XFS use different hash functions, so it’s clear that the
      exploit won’t work with them. The file generation process is randomized, and you have to remove forged names having a “/” or a NUL character, hence this explains why the number of files will vary from one run of the Python script to another.

  8. From what I know (I’m not a btrfs developer, I’ve just had some look at its data structures) these items with “Directory Objectid, BTRFS_DIR_ITEM_KEY, 64 bit filename hash” as key are stored in a btree together with all other inode-related items as you can see in the green square on . Thus we are not talking about a hash table here but about a B-tree with a worst case complexity for search, insert and delete of O(log(n)). ext3 and ext4 use a very similar directory index structure for large directories called Htree which uses the same principle, i.e. it hashes the filename and stores it in a tree (which is even similar to a B-tree), so this concept of using a hashed filename is actually nothing new in Btrfs. However as Yann Droneaud pointed out it seems that Btrfs misses the salt for the hash function that is present in ext3/4.

  9. @Jon — It’s hard to point to a single person or group of people, but there is definitely a pervasive notion that combing GPL code with that licensed under other open source agreements somehow weakens or corrupts it. I know Linus feels strongly about this, as does Stallman. Not that I begrudge them this — they have certainly earned the right to feel strongly about things!

    Admittedly, the way the GPL is written makes it difficult and cumbersome to incorporate software under different licenses. Nonetheless, it has been done with varying degrees of success in the form of device drivers and projects like ZFS On Linux. I just feel like it’s a shame to exert all this human effort to re-invent wheels based simply on adherence to license purity. Also, I would argue that the LAST thing the world needs is yet another filesystem.

    Anyhow, I’ve probably not done a very good job of answering your question. Some say the GPL is too strict, while others say it’s just right. In my experience you haven’t got it just right until you have some people making a third argument that it’s too permissive. Generally I don’t see this with the GPL the same way I do with the CDDL and BSD licenses.

    There is a good general discussion of this topic here:

    I very much believe in open source software, but I’m not a lawyer. The most important thing to me is having the free ability to use and learn from the very best technology in the world. I think the best way to do this is to have a free exchange of ideas and code between open source licenses.

  10. The screenshot with top shows 100.0%us and 0.0%sy, which would indicate an infinite loop within bash, not in the kernel code? What does strace say?

    I presume the ls completed reasonably, though, so readdir() should be fine..?

  11. The paper, “Exploiting Unix File-System Races via Algorithmic Complexity Attacks”, available from
    described hash collision attacks on the name-lookup caches implemented in Linux, Solaris, and FreeBSD. It furthermore showed a DoS attack (of sorts) in the OpenBSD default filesystem.

    The problem you describe is potentially much worse, though, since it presumably involves the on-disk format of btrfs. The cache problems can be fixed without changing on disk formats, but I’m not sure if that’s possible with the problems you’ve found in btrfs.

  12. Hm, could not reproduce this here (Ubuntu 12.10 in a VM, Linux 3.7.0-rc6+):

    $ grep disk /proc/mounts
    /dev/loop0 /mnt/disk btrfs rw,relatime,space_cache 0

    $ grep -A1 hack\ = ~/
    hack = False
    ITERATIONS = 1000
    $ time python2 ~/
    real 0m4.958s
    user 0m0.932s
    sys 0m3.856s

    $ ls | wc -l

    $ time rm -f *
    real 0m5.676s
    user 0m0.580s
    sys 0m4.816s


    $ grep -A1 hack\ = ~/
    hack = True
    ITERATIONS = 1000
    $ time python2 ~/
    real 0m8.529s
    user 0m3.728s
    sys 0m4.696s

    $ ls | wc -l

    $ time rm -f *
    real 0m6.622s
    user 0m0.588s
    sys 0m5.336s

    Indeed, removing even more files in a directory would 1) stress /bin/bash to expand “*” and/or 2) result in E2BIG

  13. I think you should recheck this problem.
    I ran your python code on my machine , which using EXT4 as its filesystem.
    And whenever I am trying to cat or rm a file , I got 100% CPU usage (almost the same as your result) and the process doing that is “bash”.
    You see , it is “bash” , not something about BtrFS .
    So I think it is just a bug of bash , not filesystem.
    Maybe you can try it on some other machines with another filesystem , I think the result will be the same.

  14. @Pascal:
    unfortunately, just like @Ravi said, I CAN reproduce this on ext4. I also strongly suspect it’s a bash issue when expend “wrong” encoded file name.

    If you want you should check zsh or some other shell other than bash (sh is provided by bash on most system so you also need to pay attention to that).

    I have no problem with zsh.

  15. @Cliff:

    The problem is that it would actually be a copyright infringement (“illegal”) to combine ZFS with Linux. CDDL is a viral copyleft license just LIKE the GPL (almost), but different enough FROM the GPL that you can’t combine code under the two licenses without violating the copyright on both works individually.

    As well, it’s practically impossible to change the license on Linux, because of the sheer number of contributors who have copyright in it — all of whom would unanimously have to agree to allow ZFS to be merged in, and/or change the license on Linux to even a new version of the GPL. It’s stuck at GPL version 2 forever, and there’s no use in talking about people being stubborn because their hands are tied.

  16. I tested your script on up-to-date archlinux x64 with linux 3.7 without success.

    seblu-btrfs-1 /mnt/test/hack # python2 ~/
    seblu-btrfs-1 /mnt/test/hack # ls -l |wc -l
    seblu-btrfs-1 /mnt/test/hack # time rm -f *
    zsh: sure you want to delete all the files in /mnt/test/hack [yn]? y
    rm -f * 0.02s user 0.29s system 80% cpu 0.383 total
    seblu-btrfs-1 /mnt/test/hack # vi ~/
    seblu-btrfs-1 /mnt/test/hack # python2 ~/
    seblu-btrfs-1 /mnt/test/hack # ls -l |wc -l
    seblu-btrfs-1 /mnt/test/hack # time rm -f *
    zsh: sure you want to delete all the files in /mnt/test/hack [yn]? y
    rm -f * 0.03s user 0.30s system 84% cpu 0.391 total

  17. $ diff -Naurp
    — 2012-12-19 11:23:45.015468677 +0530
    +++ 2012-12-18 18:43:08.227056155 +0530
    @@ -73,19 +73,27 @@ Use forge() to forge crc32 by adding 4 b
    # backward calculation of CRC up to pos, sets wanted backward CRC state
    bkd_crc = wanted_crc^0xffffffff
    for c in s[pos:][::-1]:
    – bkd_crc = ((bkd_crc <> 24] ^ ord(c)
    + bkd_crc = ((bkd_crc <> 24] ^ ord(c)

    # deduce the 4 bytes we need to insert
    for c in struct.pack(‘<L',fwd_crc)[::-1]:
    – bkd_crc = ((bkd_crc <> 24] ^ ord(c)
    + bkd_crc = ((bkd_crc <> 24] ^ ord(c)

    res = s[:pos] + struct.pack(‘<L', bkd_crc) + s[pos:]
    return res

    if __name__=='__main__':

    – hack = False
    – ITERATIONS = 10
    + try:
    + hack = int(sys.argv[1])
    + ITERATIONS = int(sys.argv[2])
    + except (ValueError, IndexError), e:
    + hack = False
    + ITERATIONS = 100
    + print "hack: %d, ITERATIONS: %d" % (hack, ITERATIONS)
    crc = CRC32()
    wanted_crc = 0x00000000
    for i in range (ITERATIONS):

  18. I tested this on a 1GB Btrfs formatted USB drive with F17 + 3.6.8 Linux kernel. Things seem to be working good, except for file deletion takes long in proportion with the number of files. I tested for

    hack: 0, ITERATIONS: 100 -> Files created: 5504
    hack: 1, ITERATIONS: 100 -> 5504
    hack: 1, ITERATIONS: 1000 ->54179
    hack: 1, ITERATIONS: 5000 -> 270866

    It took little over 1 hour to remove files in the last run with: $ find . -empty -exec rm -vf {} \; Because rm -f * exceeds the shell argument limit.

Leave a Reply

Your email address will not be published. Required fields are marked *