Wednesday, May 6, 2015

How to make two binaries with the same MD5 hash

One question I was asked when I demo'd creating two PHP files with the same hash is; does it work on compiled binaries?

Well the answer is yes in fact that is where I first got the idea from, in this demo.

That example uses a C program as both the target and also to do the binary manipulation, which slightly of obscures the way it works. It also makes use of an old very slow implementation on the Wang attack to generate the collisions. To better and more quickly show how it works for an upcoming talk I have created a really simple example using a PHP script to manipulate a binary once compiled.

I have put all the code used here on github.

Below is the super simple C program it compares two strings and if they don't match it prints out an ASCII art picture of an angel. If they do match you get a picture of a devil.
It can be compiled with gcc and executed simply by doing
longEgg$ gcc -o demo ./demo.c
longEgg$ chmod a+x demo
longEgg$ ./demo

Executing the program will print out the angel since the two strings differ in the last letter.

Now we have our compiled binary we need to do a bit of mucking about with it. What we are going to do is insert a MD5 collision into the long string of  A's of the dummy text. We only need to insert two blocks of 64 bytes but we need to insert it at the beginning of a  block i.e. when the byte length is a multiple of 64 bytes.

When we run the php script over it the first time it finds such a location and calculates the value of the four chaining variables of MD5 at that point in the file. It prints out the hex values concatenated together as a hash. We can now take that value and search for an MD5 collision with that initial state. The best MD5 collision finder is Marc Stevens fastcoll. It can typically find collisions in a couple of seconds using a a variant of the Wang attack. After downloading it you will need to compile it. There should be a Makefile for it in the code on github. Running it specifying the initial state and output files is shown below.

longEgg$ wget
longEgg$ unzip
longEgg$ make
longEgg$ chmod a+x fastcoll
longEgg$./fastcoll -i c15cfe39c40e47f5b8ae31e6658fd1bd -o a b
The -o option specifies the output files and so will create two new files a and b which contain 2 blocks of binary data. These blocks only work as MD5 collisions within the binary at that point. Running the php script for a second time will create two copies of the original compiled binary with the collisions inserted in the appropriate places.
longEgg$ I want to replace 128 bytes at position 6528 in colliding_binaries/demo
longEgg$ Chaining variable up to that point is c15cfe39c40e47f5b8ae31e6658fd1bd
longEgg$ Just output new file /Users/nmchugh/longEgg/devil with hash dea9dc288b6c56626997ce86ca8eb6da
longEgg$ Just output new file /Users/nmchugh/longEgg/angel with hash dea9dc288b6c56626997ce86ca8eb6da
So now we have created two more files angel and devil. Running each of those should give different outputs.

But they should have the same MD5 value.
longEgg$ md5 angel devil 
MD5 (angel) = dea9dc288b6c56626997ce86ca8eb6da
MD5 (devil) = dea9dc288b6c56626997ce86ca8eb6da