6. Cryptography - Hashing
Sixth section of Complete Beginner learning path.
Introduction
Some common terms to know before continuing are:
Plaintext : date before encryption or hashing, commonly text but can be a photo or other file.
Encoding : not a form of encryption, just a way to represent data like base64 or hexadecimal, immediately reversible.
Hash : output of hash function, can also be a verb like "to hash" - meaning to get the hash value of data.
Brute force : attacking cryptography by trying every password or key.
Cryptanalysis : attacking cryptography by finding a weakness in the underlying maths.
What is a Hash?
With hash functions, there is no key and it is intended to be impossible (or extremely difficult) to go back from the output to the original input. Hash functions will take input of any size and create a summary (or digest) of the input. The output is a fixed size and it is very hard to predict what the output will be for any given input and vice versa.
Good hashing algorithms should be fast to compute and very slow to reverse, any slight change in the input data should cause a large change in the output. The output is normally raw bytes encoded in base64 or hex, decoding this will give nothing useful.
Hash Collisions
Hash collisions occur when two different inputs give the same output. The pigeonhole effect prevents collisions from being avoided. There are a set number of different outputs for the hash function, but it can be given any size input. As there are more inputs than outputs, some of the inputs must give the same output - if you have 128 pigeons but only 96 pigeonholes, some of the pigeons will need to share.
MD5 and SHA1 are technically insecure due to engineering hash collisions. No attack has been able to get a collision in both MD5 and SHA1 at the same time. An example of MD5 collision can be seen here and an example of a SHA1 collision can be seen here.
Hashing for Password Verification
Web apps will often need to verify a users password - storing these in plaintext would be bad practice, numerous data breaches have leaked plaintext passwords. The word list "rockyou.txt" on Kali comes from a company that made widgets for MySpace and stored passwords in plaintext.
Passwords cannot simply be encrypted as the key needs to be stored somewhere, if an attacker gets the key, they can decrypt the passwords. Instead of encryption, hashing is used. Every password hash is stored (rather than the actual password) meaning that in a leak, the attacker would need to crack each hash. The issue here, is if two users have the same password then they will have the same hash. This means someone could create a "Rainbow Table" to crack the hashes. A rainbow table is a lookup table of hashes to plaintext which helps identify a password by its associated hash.
Websites like Crackstation use massive rainbow tables to provide fast hash cracking for hashes without salts. To protect against rainbow tables, salt is added to passwords. Salt is randomly generated and stored in the database, unique to each user. Salt is added to the start or end of the password before its hashed, meaning every user has a different hash even when they have the same password. Functions like bcrypt and sha512crypt do this automatically.
Hash Recognition
Automated tools exist for hash recognition such as HashID, however, these can be unreliable for many formats. Tools like HashID are most reliable for formats which have a prefix. Context is important as well as tools, for example, a hash found in a web app database is more likely to be MD5 than NTLM. Unix password hashes are easy to recognise as they have a prefix, the standard format is: $format$rounds$salt$hash.
Windows passwords are hashed in NTLM which is a variant of MD4. NTLM, MD4 and MD5 are all visually indistinguishable, so context is important. On Windows, hashes are stored in the SAM (Security Accounts Manager), the hashes stored here are split into NT hashes and LM hashes. On Linux, hashes are stored in /etc/shadow which is only readable by root. Examples of hashes and their formats/prefixes can be found on the HashCat Wiki.
Last updated