Announcement: hashdb toolset

The text file govdocs1-first512-first4096-docid.txt containing MD5 hashes of the first 512 bytes and first 4096 bytes of every file in the GOVDOCS1 corpus has been removed. This file was provided to assist with research of block hashes. We have since created the hashdb toolset which provides support for creating and working with hash block databases.  Please refer to https://github.com/simsong/hashdb/wiki for downloading the code, continuing progress on this topic, and links to relevant papers including:

Distinct Sector Hashes for Target File Detection

A related masters thesis on this topic was completed at Naval Postgraduate School in 2012 and can be downloaded for additional reading: http://simson.net/ref/2012/kmf_thesis.pdf