Archive

Archive for the ‘General’ Category

website transition

April 29th, 2017 No comments

The website has been transitioned to Dreamhost. The downloads remain at George Mason University and can be reached at http://downloads.digitalcorpora.org/corpora/ for the corpora and http://downloads.digitalcorpora.org/downloads/ for files.

Categories: General Tags:

“non-deterministic” USB image contributed

May 27th, 2014 No comments

We are happy to announce the contribution of four disk images of a non-deterministic USB drive. Read More.

Categories: General Tags:

Announcing New File Type Sample Files

February 5th, 2014 No comments

UT San Antonio has kindly provided digitalcorpora with open source, publicly releasable samples of 32 file types. These are the samples that were used by Dr. Nicole Beebe to develop the Sceadan File Type Classifier.

Included file types are ASP, AVI, B64, B85, BZ2, CSS, DLL, ELF, EXE, EXT3, FAT, FLV, JAR, JB2, JS, M4A, MOV, MP3, MP4, NTFS, PST, RPM, RTF, Random, SWF, TXT, Tbird, URL, WAV, WMA, XLSX, ZIP. Each file type sample can be downloaded from the website:
* http://digitalcorpora.org/corp/nps/files/filetypes1/

Also included is a _README directory that includes a list of every file downloaded and a copyright statement for the files that are covered under copyright. You can access that directory at:
* http://digitalcorpora.org/corp/nps/files/filetypes1/_README/

This “FLETYPES1” corpus supplements the files in the GOVDOCS1 corpus.

Please let us know if you use these by including this citation in your paper:

“FILETYPES1 File type samples,” Beebe, Nicole, University of Texas, San Antonio, hosted at http://digitalcorpora.org/corp/nps/files/filetypes1/. 2014

Categories: Files, General Tags:

Announcement: hashdb toolset

October 24th, 2013 No comments

The text file govdocs1-first512-first4096-docid.txt containing MD5 hashes of the first 512 bytes and first 4096 bytes of every file in the GOVDOCS1 corpus has been removed.  This file was provided to assist with research of block hashes.  We have since created the hashdb toolset which provides support for creating and working with hash block databases.  Please refer to  https://github.com/simsong/hashdb/wiki for downloading the code, continuing progress on this topic, and links to relevant papers including:

Distinct Sector Hashes for Target File Detection

A related masters thesis on this topic was completed at Naval Postgraduate School in 2012 and can be downloaded for additional reading:  http://simson.net/ref/2012/kmf_thesis.pdf

 

 

 

Categories: General Tags:

Malware Scan of Govdocs1 now available

August 15th, 2013 No comments

A malware scan of thegovdocs1 corpus is now available at http://digitalcorpora.org/corp/nps/files/govdocs1/MetascanClientLog_201306281214.txt

 

Categories: General Tags:

Bulk Extractor News and Downloads

April 3rd, 2013 No comments

File bulk_extractor-1.3.1.zip contains the source code for bulk_extractor v1.3.1.  bulk_extractor is a C++ program that scans a disk image, a file, or a directory of files and extracts useful information without parsing the file system or file system structures.  bulk_extractor is typically downloaded on a Fedora system and compiled or cross-compiled to Linux, Mac, or Windows using autotools.  Please see https://github.com/simsong/bulk_extractor/wiki/Introducing-bulk_extractor.

BEViewer.jar is an executable bulk_extractor viewer user interface.
Bulk Extractor Viewer (BEViewer) provides a graphical user interface for browsing features that have been extracted via the bulk extractor feature extraction tool.  Please see https://github.com/simsong/bulk_extractor/wiki/BEViewer.

be_installer-1.3.exe is a Windows installer for installing bulk_extractor and BEViewer v1.3 on a Windows system.

bulk_extractor.pdf, “Digital media triage with bulk data analysis and bulk-extractor,” discusses how the bulk_extractor tool is effective in providing bulk data analysis.

2012-08-08 bulk_extractor Tutorial.pdf describes how to use the BEViewer tool.  Although some of the parameters for running bulk_extractor have changed, the majority of the tutorial remains current..

Source: The information above and links were received from Bruce Allen <bdallen@nps.edu>, Naval Postgraduate School

See other bulk_extractor downloads here: http://digitalcorpora.org/downloads/bulk_extractor/

Categories: General Tags:

Hash Codes

March 29th, 2013 No comments

The following post is now obsolete. The file frequent_hashcodes_and_paths_rdc.xml has been removed from the corpus as explained in a more recent post. Please see:

http://digitalcorpora.org/archives/391

Deprecated Post from Mar 29, 2013 @ 13:13

The file frequent_hashcodes_and_paths_rdc.xml contains SHA1 hashcode and path data derived from the Real Drive Corpus collected by the DEEP Project at the U.S. Naval Postgraduate School. The file provides two kinds of data useful to forensic investigators: (1) SHA1 hashcodes that occurred for undeleted files on at least five different drives in the corpus but did not occur in the National Software Reference Library (http://www.nsrl.nist.gov). These are likely to indicate files uninteresting and excludable in most forensic investigations. File sizes and names are also given. (2) Path names (file name plus all directories) for paths that occurred on at least twenty different drives in the corpus on undeleted files. These usefully supplement the hashcodes in indicating recurring files uninteresting for investigators. However, occurrences of these files could include viruses and other malware, or could be hiding illegal content although it is unlikely.

Read more …  http://digitalcorpora.org/corp/nus-deidentified/README-frequent-hashcodes-and-paths-rdc.txt
Download XML File (HAS BEEN REMOVED): http://digitalcorpora.org/corp/nus-deidentified/frequent-hashcodes-and-paths-rdc.xml   (102 MB)

Categories: General Tags:

Forensic Innovations, Inc., analyzes the million file corpus

June 18th, 2010 No comments

Forensic Innovations, Inc., makers of File Investigator TOOLS, has performed an analysis of the 986,278 files in the “1 million file corpus”. (13,722 files in the corpus were removed earlier this year because they were from California State Government web servers that were in the .gov domain and mistakingly collected as part of the original collection effort.)

We would like to thank Forensic Innovations for their work in support of this project. We have made available their summary report and will be making available their file-by-file analysis as soon as we deploy an appropriate database on this website.

Categories: General Tags:

Open Source Forensics Conference

March 21st, 2010 No comments

We will be making a presentation and handing out DVDs filled with data at the Open Source Forensics Conference, held in conjunction with the Basis Technology Government User’s Conference, June 8-9, 2010, at the Westfield Marriott in Chantilly, VA.

Basis Open Source Forensics Conference. June 9, 2010. Held in conjunction with Basis Technology's Government User Conference

Categories: General Tags:

ISO 9660 disk images from anti-forensics.ru posted

March 8th, 2010 No comments

Our friends at anti-forensics.ru have given us seven very small disk images that are designed to demonstrate failings of particular open source Linux distributions.

You can view all of the images at http://digitalcorpora.org/corp/images/aor/. The images you will find there includes:

These images should be directly copied to a hard drive or a partition. Forensic Linux distributions would use them as root file systems and execute proof-of-concept code during the boot.

Details of why these images are useful can be found on the author’s website, at: http://www.computer-forensics-lab.org/pdf/Linux_for_computer_forensic_investigators_2.pdf

Categories: General Tags:
"This material is based upon work supported by the National Science Foundation under Grant No. 0919593. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation."