I had some thoughts about password hashing and password strength, and would like to share those with you. First we’ll look into hashing and then into the topic of commonly used passwords, password strength, and possible ways to enforce strong passwords.
We all know you shouldn’t store passwords in plain text and instead hash them, which is indeed the right way to go about things. And it seems like bcrypt is the standard to do so and it’s the default used in PHP. But likewise, bcrypt has a character limit of 72 and anything more than that just gets stripped away. So, it’s tempting to first pass the password to SHA256 to allow a virtually unlimited number of passwords and I’ve seen people do that. However you probably shouldn’t do that and here are some of my thoughts as to why.
You probably heard of the pigeonhole principle.
“In mathematics, the pigeonhole principle states that if n items are put into m containers, with n > m, then at least one container must contain more than one item.”— Wikipedia on Pigeonhole principle
The output of SHA256 has a fixed size, and thus a fixed number of possibilities. However, the number of possible inputs is infinite, which means eventually two different outputs will produce the same hash. What this means is that eventually, although an edge case, the wrong password will actually match and let you in. SHA256 uses 256 bits which /8 = 32 bytes.
I think 72 (including all UTF-8) characters is probably better than 32 bytes for password strength and won’t have any edge cases. We know that 32 bytes have 4’294’967’296 possibilities but you need to keep in mind that those 32 bytes were derived from printable characters too. You could now go ahead and check how many possibilities go into 72 characters if including all printable characters in UTF-8.
Furthermore, there are diminishing returns of using more than 72 characters, paired with the fact that most password managers I know don’t even go up that high (KeePassX allows 64, KeePassXC 128) and most users won’t use more than 72 either (although user error is a weak argument, I know).
Conclusion about double-hashing
I think the focus should instead be on “banning” common passwords, patterns, repetition and enforce long passwords. Likewise, giving a user a wrong sense of security with ridiculously long passwords that get hashed into 32 bytes should be avoided. Simply allow up to 72 character long passwords and pass them to bcrypt as is, with a salt. In PHP, that is done for you with the function
Some info on commonly used passwords
Twitter for example has a small list of (about 400) common passwords which you are not allowed to use (makes you wonder how they generated that list) but concurrently, Twitter fails with the later rule as they allow passwords with just 6 characters (that’s ridiculously short). And I don’t know if Twitter looks for patterns or just their short list (probably just the list as if they were to ban patterns they would not need most of the passwords in their list). Overall Twitter seems to fail with enforcing strong passwords.
If you want to make sure no common passwords are used, you could download a dump of real world breached passwords hashed with SHA1 that you could compare passwords to (it’s almost 12 GB in size…) here https://haveibeenpwned.com/Passwords (I do not recommend using their API, I might elaborate on that in a later post). But that would require building a small password checking service on its own.
So maybe look for patterns instead. For example, number sequences like
345678, or keyboard sequences like
qwerty, or ridiculously common passwords like
monkey (although if you enforce longer than 6 characters which you should,
monkey and the other examples couldn’t even be used as password in the first place, but maybe
monkey123 could be used which contains both a number sequence and common password you could detect).
There are many considerations to make, and it’s harder for you as dev to make sure your users use strong passwords, than it is for you the user to use strong passwords.
Some thoughts on password strength
According to GDPR you are required to enforce strong passwords, but I don’t think it states anywhere what that means exactly. Many people seem to interpret this as at least 8 characters but I think that’s still weak. I would suggest at least 16 or 20 in theory, but in practice that might cause users to just repeat their password until the required length is reached. So, perhaps 10 to 12 might make sense? You could however try to detect repetition but at some point it just turns into a cat and mouse game.
I think it all boils down to user error, and educating people on using strong passwords and making them aware of password managers.
[email protected] is not secure! Even though it ticks all the boxes of characters and lengths used in practice.
I hope you enjoyed my thoughts and I’d like to hear yours.