Overview
When storing a password or other sensitive data, a salt is a must. This post deals with the nitty-gritty of salt generation.
If you're here for simple advice, use bcrypt with a nice wrapper which handles salt generation for you.
If you're here for simple advice, use bcrypt with a nice wrapper which handles salt generation for you.
Properties of a Salt
A decent salt must satisfy two minimal requirements:
- Unique
- Unpredictable
Uniqueness is fairly obvious, if your password database uses a static salt for all passwords, then it only defeats pre-computed dictionaries and rainbow tables. It does not protect you after your database is leaked, and your attacker knows the salt -- they can simply re-compute the dictionary using the salt and often break many passwords this way. Having a different salt for each password will force the attacker to create a dictionary for every password in the database.
Unpredictability is not immediately obvious. However, take a well known open-source system that uses a sequential integer as the primary key in the database and it's salt. This clearly satisfies uniqueness. However, if an account (say, for example, admin) is always the first account on the system and therefore always uses the salt "1". An attacker can pre-compute their dictionary with salt "1", when the database is leaked, the attacker can simply look up the hash value in the dictionary and see the password.
This however, presents a major problem. If I can't predict the value of a salt, how can I be sure it will not collide & violate uniqueness?
Unpredictability is not immediately obvious. However, take a well known open-source system that uses a sequential integer as the primary key in the database and it's salt. This clearly satisfies uniqueness. However, if an account (say, for example, admin) is always the first account on the system and therefore always uses the salt "1". An attacker can pre-compute their dictionary with salt "1", when the database is leaked, the attacker can simply look up the hash value in the dictionary and see the password.
This however, presents a major problem. If I can't predict the value of a salt, how can I be sure it will not collide & violate uniqueness?
The first, naive solution, is to simply throw away colliding salts. However, once you have a certain number of passwords, you'll spend all your time generating salts.
The better solution is to ignore collisions, but make them extremely unlikely. With sufficient randomness, we can also ensure that collisions happen extremely rarely, even in a system which uses sharding or other "splitting" methods. It is this solution that we will explore.
Unique Violations
The primary strength of a salt lies with the fact that an attacker must attack each password individually, or for our purposes, almost individually.
If each password can be expected to share a salt with two or three other passwords, then this is quite bad, as the amount of passwords that an attacker can crack by trying the salts is non-zero.
Ideally, an adversary picking a salt at random from a leaked set of passwords should have a negligible chance of picking a salt which will have suffered a collision, hence attacking any single salt gives them a negligible chance of breaking more than one password with that salt.
A simple manipulation of the Birthday Paradox allows us to determine the minimum bit-length for any salt conforming to this requirement, the equation that derived is:
Where p is the probability we're looking for, n is the expected size of the password database and b is the bit-length of the salt.
So, for our requirement on a system that's expected to use 10,000 passwords, and have an attacker not expect to have more than one collision in that set would need p set to 10-5.
If each password can be expected to share a salt with two or three other passwords, then this is quite bad, as the amount of passwords that an attacker can crack by trying the salts is non-zero.
Ideally, an adversary picking a salt at random from a leaked set of passwords should have a negligible chance of picking a salt which will have suffered a collision, hence attacking any single salt gives them a negligible chance of breaking more than one password with that salt.
A simple manipulation of the Birthday Paradox allows us to determine the minimum bit-length for any salt conforming to this requirement, the equation that derived is:
b = log2(n2 - n) - log2(p) - 1
So, for our requirement on a system that's expected to use 10,000 passwords, and have an attacker not expect to have more than one collision in that set would need p set to 10-5.
b = log2(1010 - 105) - log2(10-5) - 1
b = 48.83
b = 48.83
Or about 48-bits of entropy, minimum.
Existing Standards
NIST and RSA both give recommendations on the minimum salt length to be used in any key derivation function. NIST (NIST-SP800-132) recommends 128-bits and RSA (PKCS #5 v2.1) recommends only 64-bits.
Most BCrypt implementations, when asked to get a salt for themselves will return a 128-bit salt, which is more than enough, and happens to comply with NIST's recommendations.
It is probably the case that RSA's recommendations are not sufficient for use in large organisations.
Conclusions
Simply put, if in doubt, just use bcrypt.
However, for your organisation, you can easily calculate the minimum bit-length required to ensure the desired amount of salt collisions. I would always plonk myself down on the safe-side of town and go with NIST's recommendations of 128-bits.
You should always bear in mind that poor random number generation from your OS or other source can lead to more collisions. The post above assumes good, uniformly distributed salts.