test disk image of emails available

I have created a new disk image called 2010-nps-emails that can be used for testing programs that find email addresses or perform string search. The disk image consists of 30 different email addresses, each one stored in a different document with a different coding scheme. Below are a list of the email addresses and their…

Announcing GOVDOCS1.1

As an artifact of the way that it was collected, many of the extensions for the files in the NPS GOVDOCS1 corpus did not reflect the type of the underlying file. For example, many files that were labeled ‘.xls’ did not contain Microsoft Excel spreadsheets, but instead contained HTML error messages from US government web…

Real Data Corpus FAQ

The Real Data Corpus The Real Data Corpus (RDC) is a collection of raw data extracted from data-carrying devices that were purchased on the secondary market around the world. Many studies have shown that hard drives, cell phones, USB memory sticks, and other data-carrying devices are frequently discarded by their original users without the data…

Disk Images

We have many sources of disk images available for use in education and research. The easiest disk images to work with are the NPS Test Disk Images. We also have  that contain multiple disk images. Finally, we have real disk images containing real data from real people; IRB approval is required to work with those…