How Do Passwords Work?

2022-01-14

I was curious about passwords and wrote this high-level summary for friends. I am not an expert.

Authentication is how systems decide that you are who you say you are. If someone fools your bank account, all of your money could be stolen. If someone fools your laptop, all of your vacation pictures from Peru (and perhaps much more) would be compromised.

But authentication has less intense uses too: personalizing your YouTube feed, helping you save an Amazon wishlist, and making your day-to-day time online more convenient. It's a basic part of the modern web.

Modern systems use a variety of different authentication methods to keep your accounts safe. But none is more common today than the ordinary password.

The humble password 🔑

In the beginning, authentication was simple: all you needed was a unique username for your account and a password to protect that account from would-be snoopers:

1
2
username: arunkprasad
password: hunter2

Surely nobody would guess that your password is hunter2, the most secure of all passwords. But if some intrepid fraudster did chance upon it, the system wouldn't have any clue, and it would gladly serve up your treasured hoard.

As time went on and the Internet grew, it became more common for people to have multiple accounts on different websites. And why ruin a good thing by using different passwords? It's so much easier to use the same one:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# AOL
username: arunkprasad
password: hunter2

# Neopets
username: arunkprasad
password: hunter2

# battle.net
username: arunkprasad
password: hunter2

But there is a serious problem here: what happens if those passwords are leaked?

You see, a website is ultimately just a program running on a computer called a server, and that server has to store those usernames and passwords somewhere. Usually a server stores this information in a big table with all of the users and passwords it needs to understand:

1
2
3
4
5
6
username    | password
~~~~~~~~~~~~+~~~~~~~~~~~~~~~~~~~~~~~
arunkprasad | hunter2
mjordan23   | falcon_eagle_raptor
mscott      | password1234
...

But the problem is that these servers are also protected with a password. And if someone knows the password to the server ...

8.3 million plaintext passwords exposed in DailyQuiz data breach (2021)

Oops!

And because multiple users often use the same passwords, these password leaks put millions of accounts at risk — maybe even your own.

Hashes 🍟

How can we make sure our passwords are safe against leaks? One clever form of protection is to use something called a hash function.

In general, functions are ways of converting one thing into another: inches to centimeters, pounds to kilograms, and things like that. If you convert your weight in pounds to kilograms, you can easily convert it back from kilograms to pounds:

1
2
3
4
5
def kilograms_to_pounds(weight_in_kilograms):
    return weight_in_kilograms * 2.2

def pounds_to_kilograms(weight_in_pounds):
    return weight_in_pounds / 2.2

Easy! But hash functions have a special property: in one direction, they're quick and easy to run. But in the other, they're very difficult and slow. So these functions are called one-way functions because they can't easily be undone.

As a real-world example of this, imagine taking a completed jigsaw puzzle and throwing it on the floor. That's easy; any toddler could do that, and many toddlers have. But putting a puzzle together again isn't quite so easy.

This leads to another special property of hash functions: they create the same messy output every time, so they're still predictable. In other words, they are deterministic:

1
2
3
4
5
6
7
8
>>> hash_function("password")
'5f4dcc3b5aa765d61d8327deb882cf99'

>>> hash_function("password")
'5f4dcc3b5aa765d61d8327deb882cf99'

>>> hash_function("password")
'5f4dcc3b5aa765d61d8327deb882cf99'

And there's one more property that makes them useful: very small changes in the input to the function creates radically different outputs. So in practice, there's no way to tell what the original password is just by looking at the hash:

1
2
3
4
5
6
7
8
>>> hash_function("password")
'5f4dcc3b5aa765d61d8327deb882cf99'

>>> hash_function("password1")
'7c6a180b36896a0a8c02787eeafb0e4c'

>>> hash_function("password12")
'c24a542f884e144451f9063b79e7994e'

So rather than storing ordinary passwords in plain text like we did before, we can get the password's hash:

1
2
>>> hash_function("hunter2")
'2ab96390c7dbe3439de74d0c9b0b1767'

And use that instead:

1
2
3
4
username    | hash
~~~~~~~~~~~~+~~~~~~~~~~~~~~~~~~~~~~~
arunkprasad | 2ab96390c7dbe3439de74d0c9b0b1767
...

Whenever we log in, the server will take our password, apply the hash, and check it against the hash in its table. And if there's a match — presto! We're in.

If our password is ever leaked, all a hacker will get is the hash 2ab96390c7dbe3439de74d0c9b0b1767. And since there's no easy way to guess a password from its hash, our account is totally safe.

Right?

Dictionary attacks 📚

Reversing a hash function is difficult, but it can be done by brute-force. That is, we can check every single possible password and see if it creates the hash we're focusing on.

This is difficult and slow, but it's not impossible. And if you think about a database with millions of hashed passwords, a hacker wouldn't need much effort to crack at least a few accounts open. This is because most people don't choose strong passwords — they tend to choose short or common ones.

So while the number of passwords a hacker needs to try is large, it isn't infinite. If we have a reasonable list and calculate the hash for each one:

1
2
3
4
5
6
a —> 0cc175b9c0f1b6a831c399e269772661
b —> 92eb5ffee6ae2fec3ad71c777531578f
c —> 4a8a08f09d37b73795649038408b5f33
d —> 8277e0910d750195b448797616e091ad
e —> e1671797c52e15f763380b45e841ec32
...

Then we can take the result and turn it around like so:

1
2
3
4
5
6
0cc175b9c0f1b6a831c399e269772661 —> a
92eb5ffee6ae2fec3ad71c777531578f —> b
4a8a08f09d37b73795649038408b5f33 —> c
8277e0910d750195b448797616e091ad —> d
e1671797c52e15f763380b45e841ec32 —> e
...

And the result is a mapping from hashes to passwords. Since this kind of attack uses a fixed list (also called a dictionary) of passwords, it is called a dictionary attack.

In practice, it's too difficult to store all of these hashes conveniently. But even so, there are tricks such as rainbow tables to make this approach more feasible. And since most people tend to use relatively simple and straightforward passwords, this kind of attack is often successful and can compromise millions of accounts.

Salts 🧂

There are two tactics we can use to thwart a dictionary attack.

The first is to add something called a salt to our password. The basic idea is simple. A hacker who's using a dictionary attack can check a large number of hashes against the server's database. But what if the server adds a few extra letters to the end of the password? For example, what if we add $secret to the end of each password?

1
hunter2 + $secret —> hunter2$secret

Remember: small changes in the password make big changes in the hash. So with this small change, we get an entirely new hash that the hacker almost certainly doesn't have:

1
hunter2$secret —> 2810a0351f7ce919ebfa4c9be5e75265

If we use this for every password in our database, then the hacker's tables are useless.

But what if the hacker knows the salt? Then the hacker could just remake all of their tables, and we're back where we started.

The trick is to use a different salt for each user. For example: when we store our password in the database, we also include the salt at the end:

1
2
3
4
username    | hash_and_salt
~~~~~~~~~~~~+~~~~~~~~~~~~~~~~~~~~~~~
arunkprasad | 2810a0351f7ce919ebfa4c9be5e75265$secret
...

Now whenever I log in with hunter2, the server can check the hash_and_salt, use the salt (secret) to create my hash (2810a0351f7ce919ebfa4c9be5e75265), and check that hash against the database.

But suppose I have a doppelganger who uses the same password as me. Without salts, our passwords would look the same:

1
2
3
4
5
username    | hash_and_salt
~~~~~~~~~~~~+~~~~~~~~~~~~~~~~~~~~~~~
arunkprasad | 2ab96390c7dbe3439de74d0c9b0b1767
drsaunapark | 2ab96390c7dbe3439de74d0c9b0b1767
...

But suppose his salt is abcdef. Then his hash would be very different:

1
hunter2$abcdef —> c08c64f4f874f75be8617175ff4e6d55

And it would be stored differently in the database:

1
2
3
4
5
username    | hash_and_salt
~~~~~~~~~~~~+~~~~~~~~~~~~~~~~~~~~~~~
arunkprasad | 2810a0351f7ce919ebfa4c9be5e75265$secret
drsaunapark | c08c64f4f874f75be8617175ff4e6d55$abcdef
...

So even though both of us use the same password, they look completely different to a hacker. There is no obvious way for a hacker to tell that these two hashes came from the same password. The dictionary attack fails!

Slower hash functions 🐌

Once our salts are in place, there is another tactic we can use to thwart a dictionary attack: use a slower hash function.

The function I've used in the examples above, called the MD5 hash function, is one of the simplest in the world. Professionals can apply it billions of times a second to make guesses about what a password might be. For these and other reasons, secure websites don't use MD5 hashes anymore.

Modern applications tend to use hash functions like bcrypt and scrypt, which are much slower than MD5. The way they achieve this slowness is by demanding extra memory and computation power. So there's a tradeoff here: more security at the cost of more resources. But many companies are willing to pay the price to protect their users' security.

How to choose a password 🔒

I'm hardly an expert, but here are principles I try to follow:

  • Choose long passwords. These are less likely to be in the search space a hacker would use, so they're less susceptible to dictionary attacks. And consider using passphrases (such as mint jericho wine party), which are easier to remember (relevant XKCD).

  • Use a different password for each website you use. Password managers are a great help for this.

  • Use two-factor authentication for all of your accounts, or at least the critical ones (email, banking, etc.).

Ultimately, no account is absolutely safe. If the government wants access to your account by whatever means necessary, they'll get it. What you can do is make your account hard enough to break into that nobody would bother.