Birthday Attack

A Birthday Attack is a brute force cryptographic attack which exploits the possibility of a hash collision. This exploit cracks the mathematics behind the birthday paradox through which the chance of sharing one birthday by two people is quite higher than one may think.

The Birthday Paradox

Suppose if there are "n" number of people in a room, there is a probability that some of them might have their birthdays on a similar day. Intuitively this probability might seem small. But, actually its about 50% (r1) chance that two people might share the same birthday in a room of 23 people (n=23) and this reaches 99.9% (r2) if the number of people in room are 75 (n=75).

One important point to consider here is that we don"t care which two people share a birthday. To arrive at this 50% (r1) probability of a birthday match in a room of 23 people we form 253 pairs/comparisons.

Comparisons Derivation

Whereas, if we did consider which person share a birthday i.e we fix a person/birthday and search for a person with similar birthday then we would need 253 people (n=253) to get same number of pairs/comparisons and arrive at same 50% (r3) probability.

Maths behind The Birthday Paradox

According to the pigeonhole principle, if n items are put into m containers, with n>m, then at least one container must contain more than one item.
In relation to the birthday paradox, the pigeonhole principle can be used to intuitively see that as the number of people grow larger (or approach 367), at least 2 people will have to be assigned to a certain “box” (birthday) since there are only 366 possible birthdays, resulting in people having the same birthday.

Note: To simplify the calculation we will not consider leap year and assume every birthday has an equal occurrence chance.

Instead of calculating the chance of two people sharing a birthday in a room we calculate the opposite i.e chances of two people not sharing a birthday. By this approach, the first person can have any birthday so 365/365 chance, for second person not to share a birthday its 364/365 (excluding the birthday of the first person), for third is 363/365 … so on.

Equation Derivation

It turns out that when n is on the order of .
Suppose, we have a hash algo which gives a output of 64-bit. Total hashes possible would be i.e . This means, we have a 50% chance of collision after only tries.
For a 128 bit digest, this requires () tries. (That is computationally infeasible)

Birthday Paradox exploitation in hashing

Hash function is a versatile one-way* cryptographic algorithm that maps an input (message) of any size to a unique output of a fixed length of bits (message digest).

def hash_it(input: int):
  return input % 1024

Our simple hashing approach would take the modular of the integer passed and return a output of a fixed range (0–1023). Our hash function does not map our passed params to unique indexes. This anomaly of a hash function is known as a hash collision.
According to the Pigeonhole Principle, it is impossible to avoid hash collisions, unless the size of the output is at least as large as the input (which defeat"s the purpose of a hash function).

Exploitation in action

Suppose "A" wants to share a document with "B". "A" signs the document by n-bit hash code.
Now, evil "C" can generate versions of the document (appending space, blank line, etc to output different hash but still maintaining the document visually alike to the original), generate versions of the evil document, and see which pair hashes to the same value.
The probability that there is a benign document and an evil document with the same hash is greater>0.5 i.e . If no match found, additional valid and fraudulent document are generated until a match is made.
"C" then presents the benign document for "B" to agree to, and now he has the evil document with the same signature. This attack is called a "collision attack"; as the attacker is looking for a "collision".

On the other hand, if "C" has no control over the good document, this doesn"t work. He can, of course, create 𝑁 evil documents and see if there is one with the same hash as the good one, however the probability of that happening is significantly smaller if 𝑁 is large it goes upto . This attack is called a "second preimage attack", that is, find a second message that hashes to the same value as a given one. If the source document is already fixed then we can do a pre-image attack where we try to find a message that has a specific hash value.