Announcement: hashdb toolset

The text file govdocs1-first512-first4096-docid.txt containing MD5 hashes of the first 512 bytes and first 4096 bytes of every file in the GOVDOCS1 corpus has been removed. This file was provided to assist with research of block hashes. We have since created the hashdb toolset which provides support for creating and working with hash block databases.  Please refer to https://github.com/simsong/hashdb/wiki for downloading the code, continuing progress on this topic, and links to relevant papers including:

Distinct Sector Hashes for Target File Detection

A related masters thesis on this topic was completed at Naval Postgraduate School in 2012 and can be downloaded for additional reading: http://simson.net/ref/2012/kmf_thesis.pdf

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

This site uses Akismet to reduce spam. Learn how your comment data is processed.