A Fast and Versatile QKD System with Hardware Key Distillation and Wavelength Multiplexing

The QCrypt project, funded by the Nano-Tera program, is gently terminating. QCrypt involves ID Quantique SA in Geneva, the University of Geneva through its Applied Physics Group, the EPFL, through the Telecommunications Circuits Laboratory, the ETH Zürich, through the Integrated Systems Laboratory, and the HES-SO, through two institutes of the HEIG-VD (REDS and IICT) as well as the hepia.

The QCrypt project purpose consisted mainly in building a next-generation quantum key distribution system integrated with a 100 Gb/s layer-2 encryptor relying on classical cryptography.

A first paper has been uploaded to arXiv recently, which discusses the technical aspects of the QKD engine. Co-written with 20 (!) authors, it describes for the first time, to the best of our knowledge, the throughput achievable in practice of (distilled) key bits for a pre-defined security level when taking into account finite-key effects, authentication costs and the composability of keys. Here is the paper’s abstract:

We present a 625 MHz clocked coherent one-way quantum key distribution (QKD) system which continuously distributes secret keys over an optical fibre link. To support high secret key rates, we implemented a fast hardware key distillation engine which allows for key distillation rates up to 4 Mbps in real time. The system employs wavelength multiplexing in order to run over only a single optical fibre and is compactly integrated in 19-inch 2U racks. We optimized the system considering a security analysis that respects finite-key-size effects, authentication costs, and system errors. Using fast gated InGaAs single photon detectors, we reliably distribute secret keys with rates up to 140 kbps and over 25 km of optical fibre, for a security parameter of 4E-9.

Like a Hot Knife Through Butter

More or less recently, an interesting line of attacks against software has been revisited, namely Hash-DoS, or, in a nutshell, exploiting weak hash functions used in a hash table implementation to trigger a denial-of-service.

To the best of my knowledge, this problematic has been exposed as early as in 1998 in Phrack by Solar Designer, then variants have been discussed by Crosby and Wallach at USENIX 2003, formally defining algorithmic complexity attacks, by Klink and Wälde during 28c3 in 2011, applying the idea on PHP, Python, Java, Ruby, etc. and more recently by Aumasson, Bernstein and Bosslet (see their slides at Appsec Forum 2012, and their upcoming talk at 29c3), showing that the proposed solutions, essentially randomizing the hash function, were not always as effective as expected.

Technically, this kind of attacks consists in generating a large number of colliding inputs for the hash table. Hence, instead of a having a O(1) average access time to a stored element, one can force the hash table to have a O(n) one. If one is willing to explore all the elements in the hash table, the worst-case complexity becomes O(n^2) instead of O(n). Be able to generate multi-collisions (i.e., multiple inputs, not just two, mapping to the same output), in practice hence depends on the properties of the hash function transforming elements to a key.

In this short post, I’d like to show how hash-DoS can be applied to the btrfs file-system with some astonishing and unexpected success. Btrfs, while still in development stage, is widely considered as being a viable successor of ext4, and an implementation of it is already part of the Linux kernel. According to this page,

Directories are indexed in two different ways. For filename lookup, there is an index comprised of keys:

Directory Objectid BTRFS_DIR_ITEM_KEY 64 bit filename hash

The default directory hash used is crc32c, although other hashes may be added later on. A flags field in the super block will indicate which hash is used for a given FS.

The second directory index is used by readdir to return data in inode number order. This more closely resembles the order of blocks on disk and generally provides better performance for reading data in bulk (backups, copies, etc). Also, it allows fast checking that a given inode is linked into a directory when verifying inode link counts. This index uses an additional set of keys:

Directory Objectid BTRFS_DIR_INDEX_KEY Inode Sequence number

The inode sequence number comes from the directory. It is increased each time a new file or directory is added.

Knowing how trivial it is to compute multi-collisions for a CRC-based hash, I did not resist to play a bit. Roughly speaking, computing a CRC of an n-bit message M consists in interpreting the message as a polynomial M(x) of degree n-1 over \mathrm{GF}(2), dividing it by the CRC defining polynomial P(x), and taking the remainder, hence writing something like \mathrm{CRC} = M(x) - Q(x)\cdot P(x). Obviously, adding any multiple of P(x) to the message M(x) will generate a collision. For the gory details, see this page.

I basically found two different attacks:

  • I computed the time to create 4000 empty files in the same directory whose names were randomly chosen. This takes about 0.2 seconds. The box used is a Fedora distribution within a VM (and btrfs was a loopback-ed device).

    Then, I computed the time to create those 4000 empty files in the same directory, whose names were however chosen in order to hash to the same CRC32C value. This operation fails after 5 (!) seconds and creating only 61 files. In other words, this first attack allows an adversary, in a shared directory scenario, to avoid that a victim creates a file with a known-in-advance name. According to the btrfs maintainer, Chris Mason, 

Collisions are a known issue with any of the hash based directories. […] The impact of the DOS is that a malicious user is able to prevent the creation of specific file names. For this to impact other users on the system, it must be done in directories where both the malicious user and the victim user have permission to create files. The obvious example is /tmp, but there are other cases that may be more site-specific. Group writable directories have other security issues, and so we picked the hash knowing this kind of DOS was possible. It is good practice to avoid the shared directories completely if you’re worried about users doing these kinds of attacks.

  • A bit annoyed by this answer, I tried harder and found the following: I have created several files with random names in a directory (around 500). The time required to remove them is negligible. Then, I have created the same number of files, but giving them only 55 different crc32c values. The time required to remove them is so large that I was not able to figure it out and killed the process after 220 minutes (!). The python script I used is the following, and borrows some code from StalkR:

    More exactly, I mounted a 1GB btrfs file system on a loopback device:

    In the exploit script, just put the variable  hack = False  to generate random empty filenames or  hack = True  to generate colliding filenames and hence trigger the phenomenon. Here is a screenshot of what I obtained:

Given the result, it looks like that playing with collisions is much more likely to trigger an infinite loop than just a complexity increase; at least, the btrfs code surely does not expect to handle many collisions.

Essentially, to thwart both attacks, I would recommend to use a modern lightweight hash algorithm, such as SipHash, instead of CRC32C. Another alternative is to avoid using data structures that have a high worst-case complexity, like hash tables, for storing data that can potentially be manipulated by malicious users. Sacrificing a bit of average-case performance, data structures like red-black trees have a guaranteed search time in O(\log(n)) (I learned this while reading the source code of nginx).

For the record, this vulnerability has been announced to the btrfs maintainer Chris Mason on November 14th, 2012, who acknowledged the bug, but then did not answer any of my e-mails. and mentioned that

 My plan is to get this fixed for the 3.8 merge window. We’re juggling a lot of patches right now but I expect to fix things then.

[UPDATE OF 17/12/2012] As several readers of this post have noticed, and I would like to warmly thank them for their feedback, the second attack does NOT generate an infinite loop within the btrfs code, but merely within the bash expansion code which is responsible to expand the command line rm *. This can be seen in the above screenshot, as the CPU is burnt in userland, and not in the kernel. Hence, what I thought to be a complexity attack against the btrfs file system is actually a (less glamorous) complexity attack against bash.

This said, after having communicated this unfortunate glitch to the btrfs maintainer Chris Mason, he kindly answered me the following:

You’ve found a real problem in btrfs though. Changes since I tested the crc overflow handling have made us not deal with our EOVERFLOW error handling completely right, and there are cases where we force the FS readonly when we shouldn’t. So this is a valid bug, I’m just waiting on some review of my fix, which will get backported to a number of kernels.

To summarize, the message I wanted to pass through this post remains still valid: if one uses a weak hash function, like it is the case in the btrfs file system, one should assume that malicious users can generate zillions of collisions, and one should write accordingly robust code able to handle those collisions in an efficient way.

Another possibility consists in using a lightweight cryptographic hash function that translates the search for multi-collision in a hard task. The security/performance tradeoff to find here is definitely a delicate and hard decision to take.

Finally, the first described attack, i.e., make impossible the creation of a given file within a shared directory, keeps still valid.

Keccak is the SHA-3 Winner

In a recent press release, the NIST has announced that the Belgian candidate Keccak has won the SHA-3 competition. The goal of the SHA-3 competition was to select a cryptographic hash function and to standardize it. It was initiated after the sequence of attacks against many popular designs, such as MD5, initiated in 2004 by Xiaoyun Wang and her co-authors.

The SHA-3 competition has run from November 2007 to October 2012 and has resulted in a fantastic amount of scientific work around the design, cryptanalysis and implementation of hash functions. From 64 candidates, short lists of 14 semi-finalists, then 5 finalists (BLAKE, Grøstl, JH, Keccak and Skein) have been chosen by the NIST in July 2009 and December 2010, respectively.

According to the NIST press release, Keccak has been chosen thanks to the following reasons (a more detailed report from them about the rationales of their choice should appear soon):

“The NIST team praised the Keccak algorithm for its many admirable qualities, including its elegant design and its ability to run well on many different computing devices. The clarity of Keccak’s construction lends itself to easy analysis (during the competition all submitted algorithms were made available for public examination and criticism), and Keccak has higher performance in hardware implementations than SHA-2 or any of the other finalists.

“Keccak has the added advantage of not being vulnerable in the same ways SHA-2 might be,” says NIST computer security expert Tim Polk. “An attack that could work on SHA-2 most likely would not work on Keccak because the two algorithms are designed so differently.”

For me, the selection of Keccak is a semi-surprise, as it was not the finalist with the best implementation performances: BLAKE and Skein were indeed faster in software. However, Keccak has demonstrated very good results when implemented as hardware. Furthermore, Keccak’s design is radically different from the other finalists, basing itself on the very modern cryptographic sponge construction. BLAKE and Skein are maybe more classical in their approach, more similar to the SHA-2 family. Hence, one can probably interpret NIST’s decision as a kind of “not put all my eggs in the same basket” strategy, which is very wise.

I would like to warmly congratulate the Keccak designers Guido Bertoni, Joan Daemen, Michaël Peeters and Gilles Van Assche for their formidable success, and especially to take my hat off to Joan Daemen, who won the AES competition with Vincent Rijmen about 10 years ago on October 2nd, 2000, exactly 12 years ago, and now the SHA-3 one. Joan has definitely evolved from the demi-crypto-god status to the ultimate crypto-god one ! Finally, the planet must recognize now once for all that, additionally to brewing the best beers in the world, Belgium is the location where you can find many of the finest cryptographers, too. And never forget that the four non-selected hash functions are all extremely nice pieces of cryptographic engineering !