Recently, somebody leaked several millions of password hashes stolen to LinkedIn. Most security specialists have noticed with some surprise (or not…) that the hashes were not salted. In this post, I’d like to discuss how to store securely passwords in a database.
Let us assume that a website has to manage credential data of several users, that we will name with . Each user possesses a password allowing to authenticate him/herself. Of course, several users might share the same password. For the sake of concreteness, we will also assume that people choose passwords of at most 8 characters out of the set . It means that an attacker willing to explore the whole passwords space will have at most
different passwords to try.
We now discuss several ways to store the passwords in a database, from the least secure to the most secure one.
Storing Passwords in Clear
The least secure way consists in storing tuples in the database. Unfortunately, any vulnerability of the web application allowing to dump the database (like an SQL injection) will reveal the passwords in clear. As people tend to use the same password on several websites, leaking credentials put at risk the access to those other websites as well.
Some developers like to store passwords in clear in order to give the possibility to their users to recover their password in case where they would have forgotten it. This is a bad decision: one should give a user the ability to reset his/her password, but not to recover it. Troy Hunt has written a nice blog post on the subject.
The first step towards more security consists in hashing the passwords, i.e., applying a cryptographically secure hash function on the passwords, such as SHA-256 or SHA-512. LinkedIn implemented this solution, with help of SHA-1 (A a sidenote: this hash function is known to be sub-optimal in terms of security, as it is conjectured to be possible to compute a collision in steps instead of the expected ones; however, its security with respect to the recovery of first pre-images (i.e., inverting it) is still OK. Nevertheless, as of today, it is recommended to avoid SHA-1 for new systems, even if it is not necessary to urgently upgrade existing applications. Notice however that migrating SHA-1 to SHA-256 or SHA-512 can be considered as a good security practice).
Hashing passwords, however, is not sufficient, as an adversary could perform the following attack, named a dictionary attack: first, he selects a set of passwords that are the most likely to be chosen by the users, hashes them, and store the pairs , sorted by the values, in a giant database. This phase of the attack can be performed off-line, and needs to be done only once. The off-line phase costs about hashes and memory cells for different passwords. The second step is immediate: given a password hash, one does a single lookup into the giant database, and one immediately recovers the password if it is stored into the database. Hence, the on-line phase costs only hash evaluation and one database query, so it’s lightning fast. For a password space of size , the off-line phase would cost hash evaluations, which is easily tractable in 2012 using common hardware. The storage requirements are however less easy to deal with: one needs around bytes of storage, which represents , i.e., more than 2000 drives of each.
A second, more interesting possibility for an attacker consists in mounting an attack exploiting a time-memory trade-off, i.e., building rainbow tables: during the off-line phase of the attack, the attacker computes a set of tables that cost him hash evaluations and memory cells. Once this step, which has to be done only a single time, has been performed, recovering a password costs only hash computations. Hence, a time-memory trade-off allows to trade some memory, which is typically a costly resource, with cheaper hash evaluations. For passwords, it means that the adversary will spend in th order of hash evaluations for building tables costing around memory cells, and once this step is done, cracking a password will require only about hash evaluations. Here, the figures are a bit less precise, as they heavily depend on the optimizations used in the implementation of the rainbow tables. However, one can note that it is definitely possible to mount this attack in practice given even a small hardware budget.
Hashing and Salting Passwords
In order to thwart dictionary attacks and time-memory trade-offs, one usually salts the passwords. Basically, salting a password means that you compute a hash value which depends on the password and on a salt, which is a non-secret random value of sufficient length, e.g, 32 or 64 bits. Concretely, for each user-password pair , you generate at random a salt value , and you store the triple in the database, where denotes the concatenation.
Salting passwords has two crucial consequences: first, two colliding passwords will not collide in the database, as long as the respective salts do not collide. Second, an attacker willing to mount a dictionary or a time-memory trade-off attack does not know the salt in advance, so it must build dictionaries or tables for every possible salt. Consequently, the time and memory complexities are multiplied by the possible number of salt values: given possible different salt values, dictionary attacks and time-memory trade-offs will require and memory cells, respectively, as well as hash evaluations during the off-line phase. Two crucial assumptions on the salt values for a salting scheme to be efficient are the following: salt values must be random in a cryptographic sense (they must be unpredictable for the attacker), and they must come out of a set of sufficient cardinality, i.e., 2-bit salt values won’t do the job. Coming back to our space of passwords, even a -bit salt will render dictionary and time-memory trade-offs unpractical.
Using Iterated Hash Functions
Given a database of properly hashed and salted passwords, the attacker can still mount a brute-force attack. It means that, given a triple , the attacker will try every possible candidate password one after the other until it finds the good one. Moreover, this attack can easily be parallelized on multiple CPU or GPU cores. Given a space of possible passwords, the attacker will have to invest hash evaluations in the worst case, and in average.
To slow down an attacker, one can apply several iterations of the hash functions on the password, instead of a single one. For instance, one could marry the PBKDF2 key derivation mechanism with a hash function such as SHA-256, and fix the number of iterations such that hashing a password takes half a second on a modern CPU core, for instance. Hence, an attacker will be only able to compute two hashes per second on a core, instead of several millions.
Using Memory-Hungry Iterated Hash Functions
Eventually, brute-forcing passwords can be considerably accelerated with help of a dedicated ASIC or FPGAs. In order to raise the costs for an (already quite-powerful) adversary, people have noted that memory is considerably more expensive than computational power. Starting from this observation, one can use hashing mechanisms that have been designed to need significantly more memory than a common hash function. For instance, the BSD hacker Colin Percival has designed the scrypt key derivation function with the goal of better resisting brute-force attacks implemented in hardware. An older, but more popular and widespread scheme, is bcrypt. It has been designed by Niels Provos and David Mazières.
Users are the Weakest Link
It is well-known that users tend to pick easy-to-remember passwords, hence passwords having a small entropy. Consequently, it is important to remember that if your password database leaks, the adversary can face two very different situations:
- She/he is willing to crack at least one password out of the list, but which one is not important. In that case, even using very resistant password hashing schemes will not help you a lot, as it is very likely that at least one password is very weak. Enforcing a strong password policy is then a good step in the proper direction, however, it is often delicate to define the good trade-off between security and user friendliness.
- She/he is willing to crack a precise password: here, using a good scheme can help you, provided the precise password is not too weak.
As a conclusion, developers can take some technical measures in order to make their password-based authentication mechanism more resistant, however, a good defense-in-depth strategy consists in considering the password hashes as sensitive data, and do everything possible to protect them from being leaked!