Checksums and hashes are great for detecting errors and modifications of data, but what if your data needs to detect changes to itself? I struggled with this problem when I tried to create an isolated program that could check during runtime if it had been modified. This is very useful if you need to run your program on a remote server that you don't have physical control over. More formally, what we want is computation integrity. The idea was simple:Â
if( md5(read_file(myself)) == hardcoded_hash ) execute();
I quickly realized this would be challenging since changing the hardcoded hash would also change the hash of the file. Below is an example of trying to include the md5 of a sentence in the sentence itself. Â
As you notice it is quite frustrating trying to find a match. In all honesty I'm not even sure a solution exists. Even if we assume MD5 doesn't have any collsions for this special set of messages I can't conclusively prove that a solution to md5("The md5..." + h) = h
exists. This is of course by design since it shouldn't be easy to correlate input with output in a cryptographic hash.Â
I've made a small collection of these types of self referential puzzles that you can try. It was interesting to see that while normal CRC is not as secure as MD or SHA, and can be easily forged[1], it was not as straight forward to try to crack the self referential version of CRC. You can try it out yourself, and this time I can guarantee that a solution exists!Â
After doing some more research I found out that others have fought with this problems too. For example trying to put the hash of a PDF inside the PDF itself[2]. While this didn't seem to succeed, only a year ago someone managed to create a gif that contained its own md5 hash[3], once again proving how vulnerable MD5 is.
md5() = f5ca4f935d44b85c431a8bf788c0eaca
Going back to the original problem, how can we make sure that our program runs as expected in an un-trusted environment? Turns out we can't. As Falcarin et al.[4] puts it:
Assuring that a given code is faithfully executed with defined parameters and constraints on an un-trusted host is an open problem.
However, there exists a silver bullet solution, homomorphic encryption. Newer research[5] shows that for some applications, homomorphic encryption can be used to ensure computation integrity. Problem solved right? Well implementing a fully homomorphic encryption (FHE) scheme is usually not feasible. Using Gentry's FHE[6]Â can require public keys of up 2.3 GB and a single bit operation can take up to 30 mins. A lot of interesting research is being done in this area and hopefully we will some faster FHE schemes in the near future. But until then you should run your own servers if you want to stay secure.Â
Â