A while ago a lot of people visited my site (~ 90,000 ) with a post about how easy it is to make two images with same MD5 by using a chosen prefix collision. I used Marc Steven's HashClash on AWS and estimated the the cost of around $0.65 per collision.
Given the level of interest I expected to see cool MD5 collisions popping up all over the place. Possibly it was enough for most people to know it can be done quite easily and cheaply but also I may have missed out enough details in my original post.
In this further post I’ve made an AWS image available and created a step-by-step guide so that you too can create MD5 chosen prefix collisions and amuse your friends (disclaimer: they not be that amused). All you need to do is create an AWS instance and run a few commands from the command line. There is a explanation of how the chosen prefix collision works in Marc Steven's Masters thesis.
Here are the steps to create a collision.
1) Log on to AWS console and create a spot request for an instance based on my public Amazon Machine Image (AMI). Spot requests are much cheaper than creating instances directly, typically $0.065 an hour. They can be destroyed, losing your data, if the price spikes but for fun projects they are the way to go.
I have created a public AMI called hash-clash-demo. It has the id ami-dc93d3b4 and is in the US East (North Virginia) region. It has all the software necessary to create a collision pre-built. Search for it with ami-dc93d3b4 in community AMIs and then choose a GPU2 instance. I promise it does not mine bitcoins in the background although thinking about it this would be a good scam and I may introduce this functionality.
2) Once your request has been created and evaluated hopefully you will have a running instance to connect to via SSH. You may need to create a new key pair, follow the instructions on AWS to do this and install on your local machine. Once you have your key installed log onto instance via ssh as ec2-user.
3) The shell script for running hash clash is located at /home/ec2-user/hashclash/src/scripts . Change into that directory and download some data to create a collision. Here I download a couple of jpeg images from tumblr.
4) It is best to run the shell script in a screen session so you can detach from it and do other stuff. Start a screen session by typing
screen
Once you are in the screen session kick off the cpc.sh shell script with your two files. Send the outputs to a log file in this case I called it demo.output.
Detach from the screen session with Ctrl A + D
5) Tailing the log file you should be ale to see the birthday attack to get the hash differences into the correct locations starting.
tail -f demo.output
6) Leave the birthday search to do it's thing for an hour or so. Hopefully when you come back the attack should have moved on to the next stage, creating the near collision blocks to gradually reduce the hash differences. The best way to check this is to look at files created. The workdir0 contains all the data for the current collision search for the first near collision block. More of these will be created as more near collision blocks are created.
7) Go away again, a watched collision pretty much never happens. Check back in ~5 hours that it is still going on. Tailing demo.output and listing the directory should let you know roughly what stage the attack is at.
Here we are only at block number 2 of probably 9.
8) Come back again about 10-12 hours from start and with any luck we have a collision.
This one finished at 02:45 in the morning having been started at 10:30 the previous morning. You can tell when it finished as that was the last point the log was written to. If the log log file is still being updated the collision search is still going on. It took 9 near collision blocks to finally eliminate all the differences which is normal. 16 hours is a bit longer than average.
The collisions have been created in files named plane.jpg.coll and ship.jpg.coll. You can verify they do indeed have the same md5 hash with md5sum.
Here are the images with collision blocks added.
This one finished at 02:45 in the morning having been started at 10:30 the previous morning. You can tell when it finished as that was the last point the log was written to. If the log log file is still being updated the collision search is still going on. It took 9 near collision blocks to finally eliminate all the differences which is normal. 16 hours is a bit longer than average.
The collisions have been created in files named plane.jpg.coll and ship.jpg.coll. You can verify they do indeed have the same md5 hash with md5sum.
Here are the images with collision blocks added.
I downloaded them to my local machine with scp