Search Results

Keyword: ‘nps’


May 27th, 2014 No comments

The submission contains four raw (dd) image files of the USB flash disk «Transcend JF V10 / 1GB, D33193», two packet capture (pcap) files and four log files. The disk is non-partitioned and contains no file systems; it contains many non-deterministic sectors (each sector contains 512 bytes).

Namely, each sector that doesn’t belong to a written block of flash memory cells contains non-deterministic data (instead of null bytes, as many forensic examiners tend to expect). The disk does function properly though. Several tests show that writing to a sector turns its contents to deterministic state (i.e. you will read exactly what you wrote).

Days were spent to understand why there are non-deterministic blocks of data. The study showed that each non-deterministic sector represents the contents of the SCSI READ(10) command related to reading that sector. In other words, when the disk receives SCSI READ command that covers non-written sectors it simply sends the contents of the command back to the host, and these contents appear as sector data to an operating system.

In the experiment two raw images of the USB flash disk were acquired on a Linux host using dc3dd (these image files together with corresponding dc3dd log files can be found in «linux-dc3dd/»), and two other raw images were acquired on a Windows 7 host using FTK Imager (image files and log files are located in «windows7-ftkimager/»); all images have different hash values. Windows host was also running capture software to intercept all USB commands and replies, this data was written to pcap files named «usb-1» and «usb-2» (for the first and the second acquisition accordingly). There were no writes to the disk during or between acquisitions. The disk was disconnected between acquisitions on a Windows host: this was done to assign a new tag to the command blocks of all SCSI READ(10) commands going to the disk (unlike Linux, Windows uses the same tag in the command block of all SCSI READ(10) commands, the tag seems to be generated randomly when a disk is connected via USB; Linux, conversely, assigns new tag to every command block of SCSI READ(10) command); otherwise, two images would have the same hash value on a Windows host (results of hashing the disk twice without reconnecting it are shown on the screenshot located at «windows7-ftkimager/ftk-imager-screenshot.png»).

Let’s look at the sector #100005 in four images acquired (dd options: skip=100004 count=1).

«linux-dc3dd/flash-firstrun.dd» has the following data:
00000000 55 53 42 43 94 06 00 00 00 80 00 00 80 00 0a 28 |USBC...........(|
00000010 00 00 01 86 80 00 00 40 00 00 00 00 00 00 00 60 |.......@.......`|
00000020 00 60 ff ff ff ff ff ff ff ff ff ff ff ff ff ff |.`..............|
00000030 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................|

«linux-dc3dd/flash-secondrun.dd» has the following data:
00000000 55 53 42 43 00 7f 00 00 00 80 00 00 80 00 0a 28 |USBC...........(|
00000010 00 00 01 86 80 00 00 40 00 00 00 00 00 00 00 44 |.......@.......D|
00000020 00 44 ff ff ff ff ff ff ff ff ff ff ff ff ff ff |.D..............|
00000030 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................|

«windows7-ftkimager/flash-firstrun.001» has the following data:
00000000 55 53 42 43 d8 b1 6b 91 00 40 00 00 80 00 0a 28 |USBC..k..@.....(|
00000010 00 00 01 86 a0 00 00 20 00 00 00 00 00 00 00 5a |....... .......Z|
00000020 00 5a ff ff ff ff ff ff ff ff ff ff ff ff ff ff |.Z..............|
00000030 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................|

«windows7-ftkimager/flash-secondrun.001» has the following data:
00000000 55 53 42 43 20 6a 7f 83 00 40 00 00 80 00 0a 28 |USBC j...@.....(|
00000010 00 00 01 86 a0 00 00 20 00 00 00 00 00 00 00 79 |....... .......y|
00000020 00 79 ff ff ff ff ff ff ff ff ff ff ff ff ff ff |.y..............|
00000030 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................|

As you can see, sectors are slightly different. Now let's dissect the data in the first hexadecimal dump (using various colors to highlight the bytes):

00000000 55 53 42 43 94 06 00 00 00 80 00 00 80 00 0a 28 |USBC...........(|
00000010 00 00 01 86 80 00 00 40 00 00 00 00 00 00 00 60 |.......@.......`|
00000020 00 60 ff ff ff ff ff ff ff ff ff ff ff ff ff ff |.`..............|
00000030 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................|

Letters «USBC» point us to the USB command block, which starts with four-byte signature «USBC». The structure of the USB command block is presented below (taken from Linux kernel source code):

struct bulk_cb_wrap {
    __le32 Signature; /* contains 'USBC' */
    __u32 Tag; /* unique per command id */
    __le32 DataTransferLength; /* size of data */
    __u8 Flags; /* direction in bit 0 */
    __u8 Lun; /* LUN normally 0 */
    __u8 Length; /* of of the CDB */
    __u8 CDB[16]; /* max command */

The CDB field contains an actual command transmitted. The first byte of the CDB field is 0x28, which refers us to the SCSI READ(10) command, which operational code is 0x28 (see Table 85 in the «SCSI Commands Reference Manual» by Seagate: SCSI READ(10) command is exactly ten bytes in length (not counting previous USB command block header).

struct read_10 {
    __u8 opCode; /* operational code (28h) */
    __u8 Flags; /* various flags */
    __u32 LBA; /* logical block address (MSB first) */
    __u8 Group; /* group number */
    __u16 TransferLength; /* transfer length (MSB first) */
    __u8 Control; /* control byte */

The contents of SCSI READ(10) command are: logical block address is 18680 in hexadecimal, or 99968 in decimal; transfer length is 40 logical blocks in hexadecimal, or 64 in decimal. Note that we were analyzing sector #100005, which is between 99968 and 100032 (99968+64). Now let’s check what data is present in the sectors #99967 till #100032 (one-liner for bash: «for i in `seq 99967 100032`; do echo -n “$i: “; dd if=flash-firstrun.dd skip=$i count=1 2> /dev/null | md5sum; done»): sectors #99968 till #100031 have the same data as sector #99968; sector #99967 differs from them, as well as sector #100032. The conclusion is that sectors #99968 till #100031 have non-deterministic data, which represents the contents of the SCSI READ(10) command used to read that sector range.

A program was written to study all non-deterministic sectors the same way as described above, and the results are the same — every non-deterministic sector contains a corresponding SCSI READ(10) command.

The data can be downloaded from:

Related links
1. (Flash drives and acquisition by Dominik Weber)
2. (FAT32 strangeness by «Fab4»)

Categories: Tags:


February 10th, 2011 1 comment

2010-nps-emails is a test disk that can be used for testing programs that find email addresses or perform string search.

The disk image consists of 30 different email addresses, each one stored in a different document with a different coding scheme.

Below are a list of the email addresses and their codings:

email address                             Application (Encoding)                   Apple TextEdit  (UTF-8)               Apple TextEdit print-to-PDF (/FlateDecode)                     Apple TextEdit (RTF)                 Apple TextEdit print-to-PDF (/FlateDecode)                  Apple TextEdit (UTF-16)              Apple TextEdit print-to-PDF (/FlateDecode)                         Apple Pages '09                 Apple Pages (comment) '09                       Apple Keynote '09               Apple Keynote '09 (comment)                       Apple Numbers '09               Apple Numbers '09 (comment)                Microsoft Word 2008 (Mac) (.doc file)            Microsoft Word 2008 (Mac) print-to-PDF           Microsoft Word 2008 (Mac) print-to-PDF (.docx file)           Microsoft Word 2008 (Mac)             Microsoft Word 2008 (Mac)     Microsoft Word 2008 (Mac) (Comment)               Microsoft Word 2007 (OLE .doc file within .doc)             Microsoft Word 2007 (OLE .doc file within .doc)               Microsoft PowerPoint and Word 2007 (OLE .ppt file within .doc)             Microsoft PowerPoint and Word 2007 (OLE .pptx file within .docx)               Microsoft Excel and Word 2007 (OLE .xls file within .doc)             Microsoft Excel and Word 2007 (OLE .xlsx file within .docx)                 text file within ZIP             ZIP'ed text file, ZIP'ed                text file within GZIP           GZIP'ed text file, GZIP'ed

The image can be downloaded from

Categories: Tags:

“non-deterministic” USB image contributed

May 27th, 2014 No comments

We are happy to announce the contribution of four disk images of a non-deterministic USB drive. Read More.

Categories: General Tags:

Announcing New File Type Sample Files

February 5th, 2014 No comments

UT San Antonio has kindly provided digitalcorpora with open source, publicly releasable samples of 32 file types. These are the samples that were used by Dr. Nicole Beebe to develop the Sceadan File Type Classifier.

Included file types are ASP, AVI, B64, B85, BZ2, CSS, DLL, ELF, EXE, EXT3, FAT, FLV, JAR, JB2, JS, M4A, MOV, MP3, MP4, NTFS, PST, RPM, RTF, Random, SWF, TXT, Tbird, URL, WAV, WMA, XLSX, ZIP. Each file type sample can be downloaded from the website:

Also included is a _README directory that includes a list of every file downloaded and a copyright statement for the files that are covered under copyright. You can access that directory at:

This “FLETYPES1” corpus supplements the files in the GOVDOCS1 corpus.

Please let us know if you use these by including this citation in your paper:

“FILETYPES1 File type samples,” Beebe, Nicole, University of Texas, San Antonio, hosted at 2014

Categories: Files, General Tags:

Malware Scan of Govdocs1 now available

August 15th, 2013 No comments

A malware scan of thegovdocs1 corpus is now available at


Categories: General Tags:

Obtaining Solutions

May 16th, 2013 No comments


Solution packets for these scenarios are available as encrypted PDF files:

The decrypt password is provided to faculty members teaching courses in digital forensics as accredited educational institutions. To get the solution please contact us with the WordPress contact form and provide

  • your full name
  • your phone number
  • an official web page that describes your course and clearly indicates your email address.
  • How many students and at what level (undergraduate, graduate) will be using the materials.
  • Whether or not we can put you on an announcement-only mailing list regarding new teaching materials we are developing.

Thank you!

Categories: Tags:

Bulk Extractor News and Downloads

April 3rd, 2013 No comments

File contains the source code for bulk_extractor v1.3.1.  bulk_extractor is a C++ program that scans a disk image, a file, or a directory of files and extracts useful information without parsing the file system or file system structures.  bulk_extractor is typically downloaded on a Fedora system and compiled or cross-compiled to Linux, Mac, or Windows using autotools.  Please see

BEViewer.jar is an executable bulk_extractor viewer user interface.
Bulk Extractor Viewer (BEViewer) provides a graphical user interface for browsing features that have been extracted via the bulk extractor feature extraction tool.  Please see

be_installer-1.3.exe is a Windows installer for installing bulk_extractor and BEViewer v1.3 on a Windows system.

bulk_extractor.pdf, “Digital media triage with bulk data analysis and bulk-extractor,” discusses how the bulk_extractor tool is effective in providing bulk data analysis.

2012-08-08 bulk_extractor Tutorial.pdf describes how to use the BEViewer tool.  Although some of the parameters for running bulk_extractor have changed, the majority of the tutorial remains current..

Source: The information above and links were received from Bruce Allen <>, Naval Postgraduate School

See other bulk_extractor downloads here:

Categories: General Tags:

35GB of JPEGs ready for download

March 7th, 2012 2 comments

We have created a tar and a ZIP file with 109,223 files from the govdocs1m corpus. You can download them from:   [37.6 GB]   [36.8 GB]

Please note that the ZIP file is necessarily a ZIP-64 file and will not decompress with the ZIP implementation built-in to MacOS or Windows.

Categories: Files Tags:

Real Data Corpus

February 21st, 2011 No comments

The Real Data Corpus (RDC) is a collection of raw data extracted from data-carrying devices that were purchased on the secondary market around the world. Many studies have shown that hard drives, cell phones, USB memory sticks, and other data-carrying devices are frequently discarded by their original users without the data first being cleared or purged. By purchasing these devices and extracting their data, we have created a data set that closely mimics data as it is found in the real world.

Potential Uses

The Real Data Corpus is a one-of-a-kind scientific resource for:

  • Developing and validating forensic and data recovery tools.
  • Training students in forensics and data recovery
  • Developing and validating document translation software.
  • Exploring and characterizing real-world computing practices, configuration choices, and option settings.
  • Studying the storage allocation strategies of file systems under real-world conditions

Current Contents

As of February 21, 2011, the Non-US Person’s Corpus consists of the following:

  • 1,289 hard drive images ranging in size from 500MB to 80GB.
  • 643 flash memory images (USB, Sony Memory Stick, SD and other), ranging from 128MB to 4GB.
  • 98 CDROMs

For a total of 70TB of data (uncompressed).

Access and Availability

Real Data Corpus can be distributed to sponsors and collaborators as a set AFF and E01 files. The AFF files are encrypted with AES 256 and can be based on either a pass phrase or X.509 PKI using AFF encryption.

Disk images can be downloaded over the Internet from a secure server using SSL by authorized researchers. Alternatively, we can package the files onto portable terabyte USB hard drives.

Researchers can be given an account on a multi-user Linux computer on which all of the corpora resides.

In general, use of the RDC is limited to bonafide researchers operating under the oversight of an Institutional Review Board that has a DoD Assurance. For additional informaiton, please read the Real Data Corpus FAQ.

Current Contents

Corpus Hard Drives Flash Drives Optical GB (Total Uncompressed)
BA 7 38
CA 73 1 1,064
CE 1 82
CH 2 5
CN 143 568 98 3,627
DE 36 1 755
GR 13 27
IL 229 4 2,226
IN 487 66 26,512
MX 175 1,110
NZ 1 4
PS 98 957
TH 1 3 13
UA 23 565
Total 1,289 643 98 36,990

For additional information about the Real Data Corpus, please see the the Data Sets Page of the Digital Evaluation and Exploitation (DEEP) Research Group at the Naval Postgraduate School Computer Science Department. At the present time the Real Data Corpus is a restricted access data set and is not generally available outside of the United States.

Categories: Tags:


February 8th, 2011 4 comments

The M57-Jean scenario is a single disk image scenario involving the exfiltration of corporate documents from the laptop of a senior executive. The scenario involves a small start-up company, M57.Biz. A few weeks into inception a confidential spreadsheet that contains the names and salaries of the company’s key employees was found posted to the “comments” section of one of the firm’s competitors. The spreadsheet only existed on one of M57’s officers—Jean.

Jean says that she has no idea how the data left her laptop and that she must have been hacked.

You have been given a disk image of Jean’s laptop. Your job is to figure out how the data was stolen—or if Jean isn’t as innocent as she claims.



The solution is distributed as an encrypted PDF file:

Please see our note onobtaining solutions.


Categories: Tags:
"This material is based upon work supported by the National Science Foundation under Grant No. 0919593. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation."