At the end of March 2022, we discovered a flaw in one of the core cryptographic building blocks of the Swiss Post E-Voting System, more precisely in the specifications of the recursive hash function it uses. Several system components which are critical to guarantee the confidentiality and the integrity of the votes, such as non-interactive zero-knowledge proofs and digital signatures, rely on this function.

Interestingly, the issue sheds some light on a gap between how cryptographers, cryptography standards and textbooks define cryptographic hash functions, and respectively how cryptography engineers and developers use them in practice.

Indeed, hash functions typically expect arbitrary sequences of bytes as input while developers usually work with typed items. Those encompass simple values such as numbers but also complex structures conveying a lot of meaning. A necessary step before being able to hash these items consists in transforming them into a byte sequence. While this looks like a trivial software design decision at first sight, the way this step is implemented is not without consequences regarding security.

In this blog post, we first illustrate this gap in the context of a small and simple scenario involving a farmers’ market and electronic vouchers. Starting with three naive ways to encode objects, we show how important it is to correctly serialize structured data before hashing. We then describe how a serialization issue led to a flaw in the recursive hash function of the Swiss Post E-Voting System. Finally, we discuss some of the consequences of its discovery.

At the Farmers’ Market

Let’s imagine a (somewhat modern) farmers’ market consisting of both a cashier and a delivery counter where people can obtain baskets of fruits. For simplicity, we will assume that it is possible to buy six different types of baskets: apples 🍎, bananas 🍌, cherries 🍒, lemons 🍋, oranges 🍊 and pears 🍐.

In order to buy fruits, a person must first go to the cashier and order a number of baskets. Once the customer has paid, a voucher in the form of a QR code is printed by the point of sale system. The customer then goes to the delivery counter, where an employee scans the voucher before delivering the order.

We will assume that the cashier and the delivery counter cannot be connected together and, as a consequence, the developer of the system had no other choice but to encode the order data in the QR code. To prevent the risk of forged vouchers, the QR code contains a digital signature covering the order data. Of course, an invalid signature triggers an alarm on the employee’s voucher scanner.

Signing a Farmers’ Market Voucher

Let us have a closer look at how the digital signature could be implemented. We will assume that the developer is able to select a proper algorithm (such as e.g., RSA-PSS, ECDSA or EdDSA) and to choose a cryptographic library implementing this algorithm and storing secret keys in a safe way.

The developer however still has two decisions to make:

  • determine what data needs to be signed;
  • and define how to transform this data into a byte sequence, the input format typically expected by cryptographic libraries.

It is important to recall that digital signature algorithms do not process data as is, but they hash them and compute a signature on the hash digest, because they cannot handle arbitrary long messages on their own. This works, because cryptographic hash functions (e.g., SHA-2, SHA-3 or BLAKE2) are designed to guarantee certain security properties. In particular, they are resistant to collision attacks: it must be practically infeasible for an adversary to find two different, but arbitrary messages \(M\neq M’\) that hash to the same digest, i.e., such that \(H(M) = H(M’)\). Furthermore, cryptographic hash functions are resistant to second preimage attacks: given a message \(M\), it must be practically infeasible for an adversary to find a different message \(M’\neq M\) hashing to the same digest \(H(M) = H(M’)\). These properties are essential for digital signatures to be secure, as the signature is actually computed on the digest, and not on the data in a direct way.

We now turn to the first decision that the developer must take, namely defining what data needs to be signed.

The Horton Principle

Deciding what data need to be signed in a given scenario is related to the (little-known) Horton principle, a cryptographic engineering design rule put forward by Wagner and Schneier in their Analysis of the SSL 3.0 protocol paper published in 1996, and further discussed in Ferguson et al., Cryptography Engineering - Design Principles and Practical Applications, 2nd edition, Wiley, 2010:

Authenticate what is being meant, not what is being said.

In other words, a secure system must authenticate not only a message \(M\), but also all the information that is necessary to parse into its meaning. We come back to our farmers’ market scenario for a concrete illustration.

Domain Separation

The minimal amount of information that the software developer might naively decide to sign could be an array containing six integers which encode the amount of baskets of apples, respectively bananas, cherries, lemons, oranges and pears. This would not be a good solution:

  • there is missing information about the nature of the articles: if a butcher begins to sell pork chops, rump steaks, prime ribs, rib eyes, mock tenders and ranch steaks at the market, then using the system without modification would potentially allow customers to pay for fruits but then to try to get a meat delivery;
  • there is missing information about the price of individual articles: a customer could obtain a voucher on a given day, keep it for a while, and only go for a delivery after a price increase;
  • there is no indication that a voucher has already been redeemed: let us recall that the cashier and the delivery counter terminals are not connected, therefore the delivery counter needs to have a way to figure out whether a voucher has already been used;
  • there is missing information about the farmers’ market location: if another, more expensive farmers’ market is using the same system in a different town, then one customer could order in the cheapest location and get the delivery in the upscale one;
  • etc.

For the system to be secure, the signature must therefore cover all the above information, and possibly more. This ensures a property called domain separation: partitioning the signature input data (the six integers) to different application domains so that no input is assigned to more than one domain.

Data Serialization

Let now talk about the second decision faced by the software developer: how to transform the data into a byte sequence. As a reminder, data can be represented according to a given syntax, i.e., “rules explaining how to write them”, and they possess precise semantics, i.e., “what they actually mean” as well.

The Horton Principle also implies that a message must be encoded in a non-ambiguous way in terms of semantics before being processed by a cryptographic hash function. In other words, the encoding must be injective: starting from an encoded message, the decoding procedure must ensure that it is not possible to get two decoded messages having a different meaning. Ferguson et al. are very clear about this point in their book:

To recap: whenever you do authentication, always think carefully about what other information should be included in the authentication. Be sure that you code all of this information, including the message, into a string of bytes in a way that can be parsed back into the fields in a unique manner.

As a technical side note, this requirement is also valid when computing message authentication codes (MACs), authenticated encryption with associated data (AEAD), non-interactive zero-knowledge proofs involving the Fiat-Shamir heuristic or processing customization strings with a key derivation function.

Let us recall our first naive solution, that is simply signing an array of six integers. Here are three arbitrary methods to encode those six integers in a byte sequence:

  • Concatenate the six numbers encoded as ASCII characters;
  • Concatenate the six numbers encoded as 8-bit unsigned integers;
  • Build a JSON object mapping fruit names to quantities.

In the following, we will assume that a customer has bought 1 basket of apples, 3 baskets of cherries, 12 baskets of bananas and nothing else. The developer must hence encode the array [1, 3, 12, 0, 0, 0].

Here is an illustration of the first choice:

rechash_case2.svg

On the left side of the picture, the array ["1", "3", "12", "0", "0", "0"] is encoded as the byte sequence 31333132303030. However, this way of doing is ambiguous: using this method, the array ["13", "1", "2", "0", "0", "0"] is encoded as the same byte sequence, and therefore, hashes to the same digest!

The second choice, concatenating the 6 numbers encoded as 8-bit unsigned integers, is illustrated here:

recursive-hashing-case1.svg

This approach does not suffer from the concatenation issue, however, the only hashed semantics are the numbers of baskets of each category. Most of the remaining information related to domain separation is missing.

The third approach is somewhat better, but some information related to the whole system is still missing.

recursive-hashing-case3.svg

Serialization is by definition a semantically injective encoding scheme, since one must be able to deserialize serialized data without any ambiguity in terms of meaning. Hence, a possible approach to partly enforce the Horton Principle consists in serializing data to be hashed using e.g. JSON, CBOR, XML, Protobuf or any other serialization scheme and to cryptographically process the serialized data.

Back to our farmers’ market scenario, this could concretely translate as follows:

  1. build a JSON object with all necessary information to ensure full domain separation;
  2. hash and sign the byte sequence representing the JSON object and embed it with the signature in the QR code;
  3. in the voucher scanner, recover from the QR code the signature and the byte sequence representing the JSON object, hash the latter and verify the signature.

Serialization answers, at least partially, the question “How should I encode data before hashing them?”. However, standard serialization schemes usually only offer to encode a finite number of generic data types. It might happen that developers encode custom, structured types as byte sequences, losing on the way some information about the precise type semantics.

As a side note, serialization is a complex topic in itself, too, and even popular serialization schemes suffer from traps and pitfalls.

Cryptographic Standards Only Care About Syntax, Not Semantics

Our naive farmers’ market scenario illustrates the gap between how cryptography primitive designers and (almost all) cryptography standards and software toolkits define hash functions and their APIs, and how developers use them in practice: cryptographers, standards and software toolkits define hash functions as taking a byte sequence as input with no, or only very little attached semantics, while cryptography developers work with data that has a well-defined meaning. As an illustration, here is an excerpt found in Chapter 2 of A Graduate Course in Applied Cryptography, by Dan Boneh and Victor Shoup:

In practice, keys, messages, and ciphertexts are often sequences of bytes. […] Keys, messages, and ciphertexts may also be other types of mathematical objects, such as integers, or tuples of integers (perhaps lying in some specified interval), or other, more sophisticated types of mathematical objects (polynomials, matrices, or group elements). Regardless of how fancy these mathematical objects are, in practice, they must at some point be represented as sequences of bytes for purposes of storage in, and transmission between, computers.

It is natural for cryptographers to abstract messages to be encrypted, signed or hashed as being byte sequences, as it is infeasible to take all possible use cases and their attached meanings into account.

Let’s have a look at some real-world examples:

  • standardized families of hash functions such as SHA-2, SHA-3 and BLAKE2 hash sequences of bytes;
  • standardized signatures schemes, such as DSA, ECDSA, RSA-PSS, or EdDSA, sign (previously hashed) sequences of bytes;
  • standardized message authentication codes, such as CBC-MAC or HMAC, authenticate sequences of bytes;
  • standardized authenticated encryption schemes, such as AES-GCM or ChaCha20-Poly1305, authenticate associated data consisting of sequences of bytes;
  • Standardized key-derivation schemes, such as HKDF, process context information as sequences of bytes.

TupleHash forms a partial exception: it is a standardized hash function derived from SHA-3 that is able to process arrays of byte sequences in a safe way. Hence, TupleHash would have provided two different digests for the arrays containing integers encoded in ASCII ["1", "3", "12", "0", "0", "0"] and ["13", "1", "2", "0", "0", "0"].

To summarize: all the above cryptographic standards implicitly assume that the user has the responsibility to ensure that the input byte sequences have unambiguous semantics.

That said, there exists at least one standard proposal, floating in the Ethereum blockchain ecosystem, which defines how to hash structured data together with its semantics: the EIP-712 standard.

For the record, how to hash structured data is a problem that designers of cryptographic hash functions have already met and solved in the past when working on parallel hashing modes, i.e., some internal modes of operations of hash functions that are optimized to exploit multiple CPU cores. The interested reader might have a look at the 2009 paper by Bertoni et al., Sufficient Conditions for sound tree and sequential hashing modes.

Flawed Custom Constructions

In this part, we would like to focus on a family of designs which have repeatedly been found to be flawed: the recursive hash function specified in the context of the Swiss Post E-Voting system. The design of this function has its roots in the CHVote system.

CHVote Recursive Hash

The CHVote system is an open-source electronic voting system developed by the Swiss Canton of Geneva. When in production, it was one of only two accredited electronic voting systems by the Federal Council in Switzerland. It has been offered to nearly 125’000 voters in 4 cantons (Geneva, Bern, Lucerne and Basel), for voting as well for electing at the communal, cantonal and federal levels.

Over the years, the CHVote system has been audited multiple times; the specifications and source code have been published such that external researchers, and more generally, anybody interested in understanding them and finding flaws could get access to the necessary information.

In 2018, its development, as well as the development of its successor CHVote 2.0 (remained at the proof-of-concept stage), were discontinued, mainly for financial reasons. Mid-2019, the authorities have stopped using CHVote v1.0.

The first version of the recursive hash function was specified by Haenni et al. in the CHVote Protocol Specification, Version 1.0:

rechash.png

This construction defines a (recursive) mode of operating a cryptographically secure hash function supporting structured data with different types (sequence of bytes, Unicode strings, big integers, and vectors of arbitrary items).

The function definition was updated for CHVote 2.0, with the addition of more supported object types such as relative big integers and matrices:

rechash1.png

Interestingly, it is mentioned in a footnote in the version 3.2 of the specification document that trivial collisions can be found:

footnote.png

As a matter of fact, many other collisions exist, as this function fails to take into account the type of each object before hashing: for instance, a big integer and a Unicode string represented both by the same byte sequence will hash to the same digest. We are not aware of anybody having noticed this at that time, or it was possibly thought to be unimportant.

Interestingly, the design of this recursive hash function has then been borrowed by the Swiss Post E-Voting System.

Swiss Post E-Voting Recursive Hash First Version

The Swiss Post E-Voting system is derived from a system designed by the private company Scytl which was at the time selling a competing system of CHVote. The Swiss Post E-Voting system is currently the only electronic voting system that is actively developed in Switzerland. The Federal Chancellery and the Swiss Post expect to be able to put the system in production sometime in 2023.

Accordingly with the legal requirements, the Swiss Post has implemented an extensive security review process of its system, involving the publication of the specifications and the source code, a bug bounty program as well as multiple audits over several years by several world-leading university professors and industry experts. As of June 15th, 2022, the Swiss Post has received 140+ reports from external security researchers and awarded EUR 109’000 of bug bounties.

However, the fact that two objects with different semantics could have the same byte representation, and therefore would hash to the same value was one again overlooked. This is undesirable for a secure collision-resistant hash function and can potentially cause harmful effects: for instance, a valid signature of a big integer should not be valid in a context expecting a Unicode string. In other words, it’s a clear violation of the Horton Principle.

The lack of domain separation was outlined for the first time by Sarah Jamie Lewis in a Twitter thread in March 2021.

tweet.png

Swiss Post E-Voting Recursive Hash Second Version

As a response, the Swiss Post E-Voting system designers have rapidly fixed the recursive hash function definition to take into account the type of hash objects and to ensure that “the RecursiveHash function is collision-resistant across different input domains”.

rechash2.png

We have unfortunately recently noticed that it is still trivial to find collisions even with the fixed version: for instance, the following objects (represented in Python syntax) collide:

  • the first one is an array containing a sequence of bytes and an array containing two sequences of bytes
[b'\x13\x90\xf5&\xc8\xe3\x88?\x8av\x91\x01w', 
  [b'I\x9c\x04\\\xc3\xfe\xd9t\xfd\x05p\x84\xe6UZ\x13H', 
   b"\xcb\xe8\x9a>?o\x88'W\xf8\x8b\xc3\xbe'\x91%\x08\xef\xb5"
  ]
]
  • the second one is an array containing two sequences of bytes
[
  b'\x13\x90\xf5&\xc8\xe3\x88?\x8av\x91\x01w', 
  b'\x9a^\x11\xd8,e\rw\xb1$\xfc+\x13%Y\'l\xd7H\xbcu\xbd\xde\xb4\xfd\x95\xbb\x0ee\x00^\x15"
  \xb5\x93\xb1u\x0b\x16\xd2U9F\xdcH\xc9\xdfA\xfb\xfb\x0f\x8b
  \x8fP\xed9\x8e\x99\x00W\x11`\xa3'
]

Computing such collisions requires only a negligible amount of CPU time and can be done thanks to the following simple algorithm:

def find_collision():
    w0 = os.urandom(13)
    w1 = os.urandom(17)
    
    digest_w1 = recursive_hash(w1)

    while digest_w1[0] != 0:
        w1 = os.urandom(17)
        digest_w1 = recursive_hash(w1)

    w2 = os.urandom(19)
    digest_w2 = recursive_hash(w2)

    crafted_data = digest_w1[1:]+digest_w2

    input_1 = [w0, [w1, w2]]
    input_2 = [w0, crafted_data]

    print(f"Input data 1 = {input_1}")
    print(f"Digest 1 = {recursive_hash(w0, [w1, w2])}\n")

    print(f"Input data 2 = {input_2}")
    print(f"Digest 2 = {recursive_hash(w0, crafted_data)}\n")

Once again, the flaw stems from a violation of the Horton Principle: arrays were not assigned a specific hashing domain.

We have responsibly disclosed this issue to the Swiss Post. We would like to thank them for the very smooth bug reporting process and the bug bounty.

Impact of the Flaw on the Swiss Post E-Voting System

We have not been able so far (nor have we tried very hard) to identify a way to exploit this flaw to break the confidentiality or integrity of the vote.

However, several mathematical security proofs of the system (cf. Protocol of the Swiss Post Voting System — Computational Proof of Complete Verifiability and Privacy, version 0.9.11) assume that the recursive hash function is collision-resistant: Lemmas 2 (”vote compliance”) and 3 (”correct vote extraction”), Theorems 1 (”sent as intended”) and 4 (”correct tally”) all depend on the fact that for an adversary, the probability to find collisions is negligible.

The flaw described above invalidates the security guarantees offered by these mathematical proofs, as the derived upper bounds on the adversary’s success become meaningless.

Fortunately, there exist at least two paths to handle this issue:

  • fix the recursive hash function definition and prove it is collision-resistant;
  • develop new security proofs for Theorems 1 and 4, if it is possible, that do not involve any assumption about the hash function’s collision resistance.

At the time of writing this post, this issue is still considered to be open.

Conclusion

Hashing structured, semantically rich data in a safe way is one of the responsibilities left to cryptography engineers; it is a delicate and error-prone task. Unfortunately, this topic is rarely discussed in cryptography textbooks and standards, and while many experienced cryptographers and cryptography engineers know about this kind of pitfalls, violations of the Horton Principle will for sure be a source of real-life software defects for a certain time.

We would like to thank Dominique Bongard for his many constructive comments and improvement proposals regarding this blog post.