DigitalCorpora.org is a website of digital corpora for use in computer forensics education research. All of the disk images, memory dumps, and network packet captures available on this website are freely available and may be used without prior authorization or IRB approval. We also have available a research corpus of real data acquired from around the world. Use of that dataset is possible under special arrangement.
From here you can view the available:
Most of the disk images are distributed in EnCase E01 format. We also make available a Digital Forensics XML file for many of the disk images that describes the files contained within each volume, and packets in PCAP format. Other files are available as well.
Search the Corpus!
You can now search the corpus directly by name. The search results will show up to a thousand matching files and let you download the file directly or browse the directory in which it is contained:
Browse the Corpus!
All of our site data is stored in the Amazon S3 bucket s3://digitalcorpora/. You can download from that bucket. We recommend using the bucket directly in Amazon’s cloud. We get free data storage and transfer from Amazon as part of the Amazon Open Data Program, for which we are thankful!
You can browse the S3 bucket directly using our JavaScript-based browser.
You can also access access the Digital Corpora directly using the AWS Command Line Interface. For examplek you can list the files with aws s3 ls like this:
$ aws s3 ls s3://digitalcorpora/corpora/ PRE bin/ PRE drives/ PRE drives_bulk_extractor/ PRE drives_dfxml/ PRE files/ PRE hashes/ PRE mobile/ PRE packets/ PRE ram/ PRE scenarios/ PRE sql/ 2020-11-21 10:56:19 43 README.txt 2020-11-21 10:56:20 1783404 digitalcorpora.org-hashdeep-2020-04-01.csv 2020-11-21 10:56:19 1787101 digitalcorpora.org-hashdeep-2020-05-01.csv 2020-11-21 10:56:19 1794086 digitalcorpora.org-hashdeep-2020-06-01.csv 2020-11-21 10:56:19 1794914 digitalcorpora.org-hashdeep-2020-07-01.csv 2020-11-21 10:56:20 1796103 digitalcorpora.org-hashdeep-2020-08-01.csv 2020-11-21 10:56:20 1796275 digitalcorpora.org-hashdeep-2020-09-01.csv 2020-11-21 10:56:20 1796447 digitalcorpora.org-hashdeep-2020-10-01.csv 2020-11-21 10:56:20 1796619 digitalcorpora.org-hashdeep-2020-11-01.csv $
Please note that you cannot download files in bulk using a web-scraper and the URLs displayed by the S3 bucket browser. That's because the browser uses JavaScript running in the web browser to display the directory listings. If you wish to download files from the Digital Corpora in bulk, please use the AWS CLI. For example, if you wish to download all of the files under corpora/dfrws, you could use the command $ aws s3 cp --recursive s3://digitalcorpora/corpora/dfrws dfrws.
Publications
Publications describing these corpora and our related research can be found on our publications page.
Teacher's Solutions
Some of our scenarios have solutions available! In general, solutions are restricted to:
- Faculty members of accredited non-profit educational institutions.
- Individuals who are employees of the US Government or US Government contractors who are engaged in digital forensics training or research.
In some circumstances, the teacher's solutions will also be made available to individuals working with foreign partners of the US government.
Solutions are distributed as PDF and ZIP files that are encrypted with a password; they are only available to faculty at accredited educational institutions and employees of government or law enforcement organizations that are working as researchers or trainers.
Information on obtaining the solutions can be found here: Obtaining Solutions.
Recent News
- CIRCL Forensics Exercises2023-05-05 08:57:31
- Compiled bulk_extractor 2.0 ready for download2023-03-26 19:42:07
- Android 13 Image2022-12-24 10:06:43
- New Android 11 and 12 Images!2022-09-06 00:00:23
- 19 New Scenarios!2022-07-24 00:52:01
Citing the corpora
If you are writing a research article in which you are using data from this cite, please cite our paper:
Garfinkel, Farrell, Roussev and Dinolt, Bringing Science to Digital Forensics with Standardized Forensic Corpora, DFRWS 2009, Montreal, Canada.