We have many sources of disk images available for use in education and research. The easiest disk images to work with are the NPS Test Disk Images. We also have detailed scenarios that contain multiple disk images. Finally, we have real disk images containing real data from real people; IRB approval is required to work with those disks.
A word about copyright: Some of the disk corpora contains information that is covered by copyright under US Law—specifically copies of the Microsoft Windows operating system. US Copyright Law has a four-part test that determines whether or not the distribution of copyrighted material is permissible under “fair use.” To this end, we have developed a program that breaks Microsoft executables in a way that cannot be reversed. We believe that distributing disk images with broken executables for research and educational purposes is permissible under fair use because doing so does not damage the value of the Microsoft copyrighted information that the disk images contain. Please let us know if you feel differently or if you have an alternative strategy for distributing these important research materials.
NPS Test Disk Images
NPS Test Disk Images are a set of disk images that have been created for testing computer forensic tools. These images are free of non-public Personally Identifiable Information (PII) and are approved for release to the general public. The NPS-created data in these images is public domain and free of any copyright restriction; the images may contain some copyrighted data that was made freely available by the copyright holder. These copyrights, where known, are noted in the files themselves. Currently the following images in the NPS corpus have been released:
- nps-2009-canon2 — A set of images taken on with a Canon digital camera that can be used to test basic file recovery, fragmented file recovery, and file carving.
- nps-2009-casper-rw — An ext3 file system from a bootable USB token that had an installation of Ubuntu 8.10. The operating system was used to browse several US Government websites.
- nps-2009-hfsjtest1 — A test image of a journaled HFS system in which the data from a previous version of a file can only be recovered from the HFS journal
- nps-2009-ntfs1 — A test image of an NTFS file system including unfragmented and highly fragmented files stored in raw, compressed, and encrypted directories. The decryption key is provided.
- nps-2009-ubnist1 — The FAT32 file system from which the nps-2009-capser-rw disk image was extracted.
- nps-2009-domexusers — This is a disk image of a Windows XP SP3 system that has two users, domexuser1 and domexuser2, who communicate with a third user (domexuser3) via IM and email. Two versions of this disk image will be provided:
- nps-2009-domexusers – The full system, distributed as an encrypted disk image.
- nps-2009-domexusers-redacted – The full system with the Microsoft Windows executables redacted so that they cannot be executed.
- nps-2010-emails — is a test disk image consists of 30 different email addresses, each one stored in a different document with a different coding scheme.
- nps-2014-usb-nondeterministic – this is a series of disk images that were made from a USB storage device that produced different data each time it was read. The original submission ZIP file and narrative are presented, as well as E01 files that were created by extracting the raw files from the ZIP image and re-encoding them.
Digital Corpora Scenarios
You will find additional disk images in on the Scenarios page, including:
- M57-Jean – A single disk scenario involving the exfiltration of corporate documents from an executive’s laptop.
- Nitroba University Harassment Scenario – A fun-to-solve network forensics scenario.
- M57-Patents – A complex scenario involving multiple drives and actors set at a small company over the course of several weeks.
The Real Data Corpus
Currently there are over 750 images available for use by bona fide researchers. The images are divided into two categories:
- Non-US Persons Disk Image Corpus
Contains images from disks purchased outside the United States.
- US Persons Disk Image Corpus
Contains images from disks purchased within the United States.
More information about the Real Data Corpus is availableelsewhere on this server.
CIRCL Forensics Exercises
CIRCL Forensics Exercises are little challenges developed for and during the CIRCL Forensics Trainings, and for workshops or presentations. Usually you will find a PDF with the slides and the solution inline, next to a disk image with the challenge itself:
- Wiped Disk Image
Recovering data from a wiped disk sounds impossible. But wiping a ‘big’ disk would take time. If the adversary is not patient and interrupts the wiping process after some limited time, there is a good chance to recover some plain data, or even complete file system structures and/or partitions. Goal of this exercise is, to recover data from a partially wiped disk image.
This is a manual approach, to learn how to analyze data on byte level.
Please feel free to let us know if you find this corpus is useful by leaving a comment below. If you decide to use this corpus in published research, the appropriate citation is: Garfinkel, Farrell, Roussev and Dinolt, Bringing Science to Digital Forensics with Standardized Forensic Corpora, DFRWS 2009, Montreal, Canada
Many of the disk images are distributed in E01 or AFF format. For information on format conversion, please see this page.
Looking for more disk images? You will find them:
- Computer Forensic Reference Data Sets (CFReDS) for digital evidence at the National Institute of Standards and Technology.