Hashing Explained: Checksums, Passwords & HMAC

Hashing sits under file downloads, version control, digital signatures and every login you've ever used, yet it's routinely muddled with encryption and encoding. They're three different things solving three different problems, and getting them straight clears up most of the confusion. You can hash any text on the hash generator to follow along.

what it is

A one-way fingerprint

A cryptographic hash function takes any input — a word, a file, a gigabyte — and produces a fixed-length string called a digest. Three properties make it useful. It's deterministic: the same input always gives the same digest. It has the avalanche effect: change one bit of input and about half the output bits flip, so similar inputs produce wildly different hashes. And it's one-way: there's no algorithm to run it backwards and recover the input. SHA-256 of hello begins 2cf24dba…; change it to Hello and you get 185f8db3…, with no resemblance to the first. That extreme sensitivity is the property integrity checks lean on: flip a single byte anywhere in a multi-gigabyte file and its digest comes out unrecognisable.

Concept	Reversible?	Needs a key?	For
Encoding (Base64)	Yes, by anyone	No	Carrying data through a text channel
Encryption	Yes, with the key	Yes	Keeping data secret
Hashing	No	No	Fingerprinting and integrity

So a hash is not encryption (there's no key and nothing to decrypt) and not encoding (you can't get the data back). If someone says they'll "hash it so it's encrypted", that's the first sign something's about to go wrong.

integrity

Checksums: proving a file is intact

The everyday use is integrity. When a project publishes the SHA-256 of a download, you can hash the file you received and compare — if the digests match, the file arrived byte-for-byte intact and wasn't corrupted or swapped for a tampered copy. Git uses hashing for the same reason at a deeper level: every commit and file is addressed by its hash, which is how Git knows instantly whether two trees are identical. Worth separating from this is the humble checksum like CRC32: it's fast and catches accidental corruption (a flipped bit in transit), but it's trivial to forge on purpose, so it's for detecting accidents, not attackers. When tampering is on the table, you want a cryptographic hash, not a CRC.

passwords

Why you can't store passwords with SHA-256

Here's the rule, stated as bluntly as it deserves: a general-purpose hash like SHA-256 must never be how you store passwords. The problem is speed. SHA was engineered to run as fast as the hardware allows, and dedicated cracking rigs weaponize exactly that, testing astronomical numbers of candidate passwords against a stolen hash table until matches fall out. A digest built for raw throughput is a gift to whoever walks off with your database. Pile on the existence of rainbow tables — vast precomputed maps from common inputs straight to their digests — and an unsalted hash frequently isn't cracked at all, merely looked up.

Two fixes are needed, and you don't get to pick just one. A salt — a unique random value kept per account and mixed into the hash — forces two identical passwords onto different digests, which instantly renders precomputed tables useless. On its own, though, a salt doesn't help against a fast offline guess, so the algorithm itself has to be deliberately expensive: slow, memory-hard, and fitted with a difficulty setting you can dial upward as machines get quicker. Purpose-built password hashes roll all of that into a single call, salt included — Argon2id is the current recommendation, with bcrypt and scrypt still perfectly respectable. Assembling your own from SHA plus a salt misses the entire point, because the cost is the protection and SHA flatly refuses to be costly.

If you see sha256(password) near a login, treat it as a bug. Not a style preference, not "good enough for now" — a security defect. Use bcrypt, scrypt or Argon2 for anything that protects an account, and reserve SHA for the checksums, fingerprints and signatures it's actually built for.

keyed hashing

HMAC: proving who sent something

A bare hash has one blind spot: it confirms data is unchanged, but says nothing about who it came from, since the function is public and anyone at all can recompute a matching digest. Authentication needs a secret somewhere in the mix, and that is precisely what HMAC supplies — it combines your message with a key shared only between the two parties, yielding a tag nobody can reproduce without that key. The pattern turns up across integrations. When a payment processor or a code host calls your webhook, it attaches an HMAC tag derived from the request body and a shared secret; you recompute the tag on your end and discard anything that fails to match, which is what stops an attacker from inventing fake events. Signed API requests follow the same recipe, and a JWT's HS256 signature is just HMAC applied to the token. The engine doing the work is still SHA — adding a secret key is what promotes "nothing tampered with this in transit" into "this genuinely originated from who it claims".

choosing

Which algorithm

For integrity and fingerprinting, SHA-256 is the sensible default; SHA-512 is fine too and can be faster on 64-bit hardware. Avoid MD5 and SHA-1 for anything security-related — both are broken, with practical collisions demonstrated, meaning an attacker can craft two different inputs with the same hash. They linger only as fast checksums for non-adversarial corruption checks, and even there SHA-256 is a safer habit. For passwords, ignore the SHA family entirely and use a password-specific function. The decision tree is short: integrity, use SHA-256; authenticity with a shared secret, use HMAC; passwords, use Argon2 or bcrypt.

Keep those three boxes separate — fingerprint, secret-keyed tag, password hash — and the rest follows. Most hashing disasters are really one of these tools being asked to do another's job.