One thing that's worth remembering about randomly generated tokens is that it's important to always use safe comparison methods when comparing them to the stored one - otherwise you could be vulnerable to timing attacks.
I am wondering how that relates to searching for the token in a database (index). Does it still matter? I would assume the time might depend on where or even wether the token is in the index or not, not sure how much though and what to do about it.
Yes, it matters - timing attacks could absolutely work against an indexed column in a database.
The trick I've used for this is to have tokens that look like this:
"234523:7a561002f780e23853bdfbd89ae79bf2"
Then you have database entries like this:
id = 234523
token = 7a561002f780e23853bdfbd89ae79bf2
When a token comes in you use the bit before the : to look up the database record by its indexed ID, then perform a safe comparison between the rest of the token and the value you retrieved from the database.
That way you’re hashing the user input, which sanitizes it. It also protects against timing attacks.*
And most importantly it protects against credentials leaking. Even insiders who can read a token from the database can’t use it to authenticate, because the app logic expects a pre-image of that value.
*HMAC resists timing attacks because the attacker can no longer control the bit pattern on either side of the equality check.
Give me a break -- we're just arguing semantics at this point. My whole premise was that tokens are a secret, just like passwords are a secret, and that tokens should be treated like a first-class secret. Not that they're absolutely "the same." The article gives the illusion that Simple Random Tokens are safe to store as plain text. I was arguing that it's not, and that the token should be ran through an HMAC function at the very least to prevent plain text storage and to harden against timing attacks when querying. Nowhere was I implying that passwords and tokens are strictly identical in all aspects. I've worked at places where tokens are stored in plain text (i.e. almost everywhere I've worked) and it's a bad practice.
This is not true at all. This whole discussion is pretty confused. All of the mitigations we apply to passwords are premised on passwords being (1) cryptographically weak, (2) irrevocable, and (3) infectious to other systems. Nerds are, for reasons I will never understand, hyperfixated on "salting". Randomized password hashes primarily address an old attack that is more or less irrelevant to any modern hash or KDF. But that attack, too, mattered because (1), (2), and (3) hold for passwords.
Same here. These are all examples of the same things. If you get your hands on the secret, you can do anything.
The only thing the post explains that there is a difference between checking a secret with the database vs message validation, also based on a secret. But for both, if you have the secret, you're doomed.
Expiration / immediate revocation when doing message validation is more difficult than having a distributed cache, which supports key eviction/expiry/removal.
Also, an asymmetric solution has similar problems: if you get the private key from the client, you're doomed. So it does protect the server part, but if people have access there, you're even more doomed. But at least it'll be easier to identify who had a security leak./
I think the difference may be in the "They’re easily revoked and expired" part.
Passwords are usually one-per-resource, e.g. a user has a single password. If that password is compromised you can reset it, but all the consumers that used it will need the new credential.
Whereas tokens are typically one-per-consumer, so if one is compromised you can revoke/expire just that token, without affecting other consumers.
Same with expiry times -- you can set that on a token without it expiring all access to that resource.
Not sure if that's what the author had in mind, but it's a difference in how these things are often used (even if fundamentally they are otherwise very similar).
Part of the problem here is that these discussions mean a bunch of different things when they draw the equivalence between passwords, tokens, and keys. Sometimes we're talking about the secret storage problem; sometimes we're talking about the bearer token problem; sometimes we're talking about brute-force, human-recall, and reuse problems.
Frustratingly, different tokens have different levels of exposure to all of these problems. A well-confined Macaroon has essentially none of the secret storage problems of a password. An API request authenticating key has none of the bearer token problems. No API token has the human-recall, reuse, and brute-force problems.
People hyperfixate on the secret storage problem, I think because they feel like they can get their heads around it. As you can probably tell, I'm loath to relitigate the debate about whether we should store passwords with secret salts. Even when the discussion is legitimately focussed on that problem, I find discussions about it go to weird, incoherent places that aren't really informed by real threat models, and are really hair-splitting arguments about the security of environment variables vs. the security of filesystems, just dressed up as top-3 security considerations.
Even for trivial basic-auth API keys, the secret storage problem is not the same as that of a password. Part of the problem with passwords is that they effectively cannot be revoked; "revoked" passwords creep back into systems, and, worse, infect other systems. That's not how API keys work; the threat model is different and so are the countermeasures that are profitable to deploy for them.
Another thing that I feel happens in discussions like this is that there's no notion of cost-benefit. There is option A, and option B, and option B is on some axis superior to option A, even if that benefit is marginal. Since there's no engineering cost to consider, there's no meaningful discussion. But in reality, there's always a cost: deploying a countermeasure at a minimum incurs the opportunity cost of not deploying some other countermeasure that could have been built with the same (finite) engineering resources. Since these discussions always seem to get mired in secret-salt double-hashing hmac(hmac(root-secret, measurement_1 || measurement_2), secret) stuff, I hope you can see the concern: the discussion essentially advocates for ever-more-marginal wins that "fit" the message board thread, rather than seriously considering the real problem.
But of course, the other problem is that the arc of the thread bends towards HMAC'ing your Biscuit token. "Tokens are just passwords", after all.
At any rate: this thread says, "passwords are secrets, tokens are secrets, ergo passwords are tokens". Signing keys are also secrets. Nobody HMACs them. Something is wrong with the syllogism. I'm not that interested in picking apart what.
Serious question - what makes the database serialization not vulnerable to timing attacks in the same vein? I wouldn't expect those to be purely constant time implementations.
You can’t perform a timing attack for a token “foo” in SELECT WHERE token = :token if the token stored in the DB is the HMAC of “foo”. E.g. trying “f” and then “fo” produce 2 entirely different, random tokens from the query’s POV. The attacker could never deduce that the correct token is “foo.”
Most naive string comparison methods (including language expressions like above) will usually compare one letter at a time until a mismatch is found. This allows an attacker to build up the correct password one character at a time. The correct letter will take longer to check then the incorrect.
But in the "real world" wouldn't any such time difference be very small and fluctuations in network delay and measurement error so large in comparison that this isn't typically exploitable?
Sure, better be safe and use a safe comparison method and that's simple enough, but still, how realistic is such an attack over the public internet?
The trick with timing attacks is that you don't measure the time taken for a single request. You send the same request thousands of times and take an average of the response time, which lets you pick away at the secret one character at a time.
Sure, that's what you'd attempt to address errors, but that still doesn't convince me. That only gets you so far. My intuition says that you won't be able to distill any meaningful results in practice across long distances and requests being routed to different backend servers under varying load. You are trying to isolate a very specific duration in the order of microseconds.
Of course, I'm not in the field and am happy to learn more. However, my intuition also tells me we should have seen more realistic experiments and results over the years to get a clear picture of the extent at which such an attack is actually feasible and when it seizes to be.
The difference is probably below 1 microsecond. A modern CPU running at 3GHz roughly performs 3 instructions every 1 nanosecond. It varies a lot but character by character comparison, especially one that’s happening often and thus cached is one of the most trivial usecases so let’s assume this nominal throughput.
I’d estimate a naive comparison loop to be around 20-30 cycles, for compare and control? And let’s wrap it in 250 cycles more because someone decided to use Python. 300 cycles, then, are about 100ns (tenth of a microsecond).
Not saying it can’t be done with a timing attack over the internet but you’d need a huge sample size.
A real attacker would try to minimize all those constraints by being collocated with the target as close as possible such as in the same datacenter or same rack if possible.
One thing that's worth remembering about randomly generated tokens is that it's important to always use safe comparison methods when comparing them to the stored one - otherwise you could be vulnerable to timing attacks.
In Python you can use secrets.compare_digest(a, b) for this: https://docs.python.org/3/library/secrets.html#secrets.compa...