Collection of Hash Values of Forensically Uninteresting Files Available
This file lists SHA-1 hash values of files that are uninteresting for
forensic investigations on a variety of criteria, including frequency
on drives of both hash value and path, time of creation within both
the minute and the week, file size, directory context both in path and
in sibling files, and file extension. Hash values are listed as 40
hexadecimal characters. This data is derived from the Real Drive
Corpus collected by the DEEP Project at the U.S. Naval Postgraduate
School, plus data from drives in classrooms and laboratories at NPS
and some other sources. Hash values in the January 2014
version of NSRL (the National Software Reference Library, nist.gov)
have been excluded.
The criteria for selecting these hash values and the methods used to
obtain them are described in
http://faculty.nps.edu/ncrowe/uninteresting.htm but have now been
applied to significantly more files than the corpus used for the
paper. Our methods focus on cross-correlation of files in a large
corpus and are thus quite different from those used in collecting the
NSRL data. They were obtained from images of 245 million files on
3905 drives. Currently our set has 16 million hash values not in
NSRL, and NSRL has currently 36 million hash values, so this is a
significant supplement to NSRL.
This data was produced in July 2014 by Neil Rowe, firstname.lastname@example.org.
Please acknowledge us in publications if you use this data.