Forensic Innovations, Inc., has kindly made available this simple statistical report of the govdocs1 digital corpus:
Summary of Investigated Source:
Sum: Included: Filtered: Total:
----------------------------------------------------------------------- ---------- ---------- ----------
Files 986278 0 986278
Directories 0 0 0
File Sizes (KB) 488658258 0 488658258
Wrong File Extension 33917 0 33917
Scan Time (hh:mm:ss) 10:12:36 0:00:01 10:12:37
Attributes:
Archive 1 0 1
Creation Time: (When a file was initially created or last copied to.)
Oldest 05/21/2010
Latest 05/21/2010
Write Time: (When the contents of a file were last changed.)
Oldest 12/31/1969
Latest 06/02/2028
Access Time: (When the contents or metadata of a file were last read or changed.)
Oldest 05/21/2010
Latest 05/21/2010
Accuracy:
High (99.99%) 913241 0 913241
Medium (90%) 71970 0 71970
Low (50%) 337 0 337
None 730 0 730
Platforms:
N/A 730 0 730
Commodore Amiga/64 239058 273 239331
IBM OS/2 585079 0 585079
Apple Macintosh (Intel) 1840 0 1840
Apple Macintosh (Motorola) 670414 0 670414
MS Windows 859022 0 859022
MS/PC DOS 721408 0 721408
Sun OS 582967 0 582967
UNIX 770608 0 770608
Atari 22 0 22
MS Windows Mobile/Pocket PC/CE 1641 0 1641
Palm OS 1649 0 1649
Linux 3514 0 3514
Storage:
N/A 173 0 173
Cabinet/Archive 14810 0 14810
Binary 559855 0 559855
Bitmap/Raster 151790 0 151790
Digital Audio 8 1 9
Music Notes 14 0 14
Text 620526 0 620526
Translated 573 0 573
Vector 546 0 546
Floating Header 346337 0 346337
Compressed 14746 0 14746
Encrypted 8 0 8
Content:
N/A 882 0 882
Video 3483 0 3483
Database 24820 0 24820
Email 2004 3 2007
Document 539100 0 539100
Font 231275 0 231275
Game Data 0 15 15
Graphic Image 464870 0 464870
Hypertext 467404 1 467405
Personal/User Data 961914 0 961914
Library of Functions 2 0 2
Macro/Script 351781 0 351781
Program Data 41616 0 41616
Program Executable 277 0 277
Raw Printer Data 26190 0 26190
Shortcut/Link 5 0 5
Sound/Audio 10 0 10
Source Code 36580 0 36580
Spreadsheet 85110 0 85110
Template 306 0 306
Text 727217 0 727217
Presentation 222 0 222
CAD/3D Model 138 0 138
Archived Files 14093 0 14093
Form 2 0 2
Encryption Key 1 0 1
File Types:
Type
File Type: Index Included: Filtered: Total:
--------------------------------------------------------------- ----- ---------- ---------- ----------
Text File 3 88856 0 88856
Graphics Interchange Format 4 36302 0 36302
MS Windows Bitmap 5 72 0 72
PK Zip Archive 12 14 0 14
Targa Bitmap Image 19 1 0 1
DOS Batch File 22 3 0 3
AutoCAD Drawing 26 2 0 2
BASIC Script/Source Code 28 19 0 19
Gzip Unix Archive 44 14021 0 14021
Arc Archive 46 0 1 1
Zoo Compressed Archive 56 1 0 1
Help File 57 7 0 7
dBase III/III+/IV/FoxBase+/FoxPro Database 58 2662 0 2662
Dr. Halo Picture 67 0 5 5
AutoCAD Drawing Exchange (ASCII) 73 39 0 39
Lotus 123 Ver. 1 & 2 Worksheet 104 7 0 7
Lotus 123 Ver. 3 & 4 Worksheet 105 2 0 2
Lotus 123 Ver. 1 Worksheet 106 1 0 1
MS Excel Worksheet/Template (OLE) 111 62876 0 62876
Encapsulated PostScript Preview 133 417 0 417
MS Works for DOS Document 134 1 0 1
Ventura Publisher File 152 2 0 2
WordPerfect Document 157 364 0 364
WordPerfect Support File 163 3 0 3
MS PowerPoint Slides/Add-on (OLE) 164 51292 0 51292
Encapsulated PostScript Document 171 5079 0 5079
X11 BitMap 179 7 0 7
JPEG File Interchange File 185 109283 0 109283
HP Printer Control Language File 193 1 0 1
MS Excel Workspace/Workbook 195 1 0 1
ACT! 2.0 Report 208 1 0 1
MS Word for Macintosh Document 216 138 0 138
MS Word for DOS/Macintosh Document 217 54 0 54
MS Word for Windows Document (OLE) 229 76605 0 76605
MS Windows MetaFile (placeable) 230 1 0 1
MS Windows 3.x Logo 252 1 0 1
Adobe Portable Document Format 258 231106 0 231106
BinHex Archive 259 4 0 4
MS Rich Text Format Document 269 1067 0 1067
MS Compound Document (OLE) 274 375 0 375
MS Windows Policy 299 2 0 2
Adobe PostScript Document 301 20630 0 20630
MS Windows Shortcut/Link 314 2 0 2
HyperText Markup Language 315 202440 0 202440
Tag Image File Format (Intel) 321 31 0 31
EXtensible Markup Language 342 32186 0 32186
Adobe PhotoShop Image 400 2 0 2
Java Script Source Code File 401 379 0 379
Source Code Make File 405 59 0 59
C/C++ Source Code File 406 173 0 173
Printer Job Language Image 408 20 0 20
Adobe PostScript Document (PJL) 410 431 0 431
MS Developer Studio Project 411 2 0 2
Virtual Reality World (Binary) 413 12 0 12
Cascading Style Sheet 415 157 0 157
MS Visual C++ Resource Script 416 2 0 2
Java Source Code File 418 235 0 235
Shockwave Flash Object 429 3473 0 3473
Adobe Illustrator Drawing 441 453 0 453
ANSI Text File 444 2 0 2
Active Server Page 447 149 0 149
Comma Separated Values Text File 451 18717 0 18717
Setup Information 455 198 0 198
Initialization File 456 133 0 133
Printer Separator Page 459 25 0 25
Adobe Linguistics File 468 1 0 1
UHArc Compressed Archive 484 1 0 1
HTML + XML Namespace 488 4416 0 4416
MS Visual Basic Class Module 510 1 0 1
Evolution Email Message 538 2 0 2
Horde Internet Messaging Program (IMP) Email Message 542 1 0 1
Mutt Email Message 546 1 0 1
EXtensible Style Language 570 10 0 10
MS Office Outlook 2003 Email Message 571 1 0 1
Microsoft Outlook 2000 IMO Email Message 574 6 0 6
MS Outlook Express Email Message 575 12 0 12
Pine Email Message 577 4 0 4
MS Office Macro Reference (OLE) 591 2 0 2
Common Gateway Interface Script 618 34 0 34
Code Page Translation File 636 1 0 1
MS Visual Studio Properties 656 2 0 2
Wise Installer Log 658 2 0 2
Log File (Unknown Source) 660 118 0 118
SGML Document Type Definition 686 255 0 255
AutoDesk Web Graphics Image 692 299 0 299
Internet Message 703 1251 0 1251
MS PowerPoint Slides (XML) 718 115 0 115
Fractal Image File 725 3 0 3
Flexible Image Transport System Bitmap 726 1057 0 1057
FrameMaker Document 730 28 0 28
GenePix Array List 755 146 0 146
MS Excel Graph 768 3 0 3
ISO 9660 CD-ROM Image (Data Mode 1) 803 1 0 1
Open Inventor 3d Scene (ASCII) 807 2 0 2
HP Printer Control Language Image (PJL) 836 3 0 3
Berkeley UNIX Mailbox Format 862 555 0 555
Eudora Mailbox 863 1 0 1
Monarch Graphic Image 870 1 0 1
Object Oriented Graphics Library: Quadrilaterals (ASCII) 906 4 0 4
Eudora Email Message 933 24 0 24
NASA Planetary Data Systems Image 938 162 0 162
Perl Application 955 66 0 66
AutoCAD Plot Drawing 960 1 0 1
Portable Network Graphics Bitmap 965 4125 0 4125
MacPaint Bitmap 966 3 0 3
Lotus Freelance Graphics 97 File 976 2 0 2
Python Tkinter / UNIX Shell Script 987 220 0 220
XML Resource Description Framework 1007 86 0 86
Semicolon Divided Values File 1051 138 0 138
Standard Generalized Markup Lang 1060 1641 0 1641
ArcView GIS Shape 1063 3 0 3
Structured Query Language Query 1079 154 0 154
Structured Query Language Report / Program 1080 5 0 5
Tape Archive (Compressed with Gzip) 1088 5 0 5
Thumbs Plus Database 1089 2 0 2
UU-Encoded File 1109 1 0 1
MS Visual Basic Project 1112 1 0 1
MS Visio 3/4 Document/Drawing/Shapes/Template 1130 6 0 6
MS Write / Word Backup 1136 35 0 35
MS Visual BASIC Source Code 1214 5 0 5
MS Visual BASIC Form 1223 2 0 2
MS Visual BASIC Script/Header 1225 2 0 2
Text File: Unicode/DoubleByte/UTF-16LE 1242 15 0 15
MS Excel Spreadsheet (XML) 1248 507 0 507
MS Word Document (XML) 1249 352 0 352
Text File (UTF-8) 1254 24 0 24
Source Code (General) 1256 1599 0 1599
Tab Separated Values Text File 1258 2822 0 2822
MS Windows Media Active Stream 1259 7 0 7
Pro/ENGINEER Geographic Image 1260 1 0 1
Internet Message (MIME) 1262 119 0 119
Adobe Portable Document (MacBinary) 1331 168 0 168
Generic Sound Sample 1367 1 0 1
NIST NSRL Hash Database 1377 10 0 10
Bzip Archive V2 1426 1 0 1
CPIO Archive 1427 1 0 1
UFA Compressed Archive 1438 35 0 35
PestPatrol Scan Strings 1454 1 0 1
MS Access Report / Snapshot 1521 26 0 26
Assembly Source Code File 1803 256 0 256
MS C# Source Code 1808 1 0 1
MS Outlook Rich Text Formatted Message 1811 2 0 2
MathCaD Document 1898 4 0 4
MIME HTML Web Page Archive 1910 1 0 1
MapInfo Spatial Table 2051 3 0 3
Virtual Calendar File 2072 4 0 4
XML Schema 2132 35 0 35
XML Paper Specification Document (Open XML) 2148 1 0 1
VTeX Multiple Master Font Metrics 2169 1 0 1
MS Visual Studio.NET Deployment Project 2184 1 0 1
Web Service Description Language 2186 3 0 3
MS Windows .NET Application Configuration 2189 2 0 2
MS Word Document (Open XML) 2208 164 0 164
MS Excel Spreadsheet (Open XML) 2209 39 0 39
MS PowerPoint Presentation (Open XML) 2210 220 0 220
UNIX Program/Program Library (32-bit) 2264 1 0 1
UNIX Shell Archive 2311 1 0 1
GNU Info Hypertext Document 2318 1 0 1
EXtensible Markup Language (UTF-16LE) 2330 24 0 24
EXtensible Markup Language (UTF-8) 2331 960 0 960
ArcExplorer Project 2359 2 0 2
Grace Project File 2360 5 0 5
MS J# Source Code 2394 123 0 123
Personal Home Page Script 2403 10 0 10
Debian Linux Package 2407 1 0 1
AppleSingle MIME Format 2438 10 0 10
AppleDouble MIME Format 2439 5 0 5
Google Earth Keyhole Markup Language 2469 611 0 611
Medical Waveform Description 2484 1 0 1
OpenDocument Text 2500 2 0 2
AVS Field Data 2532 1 0 1
Object Oriented Graphics Library: Objects (ASCII) 2551 2 0 2
ACIS 3D Model 2563 4 0 4
Facility for Interactive Generation File 2596 5 0 5
MS Windows Media Player Play List 2602 1 0 1
Perfect Office Document 2603 2 0 2
The Bat! Email Message 2604 2 0 2
Yahoo! Mail Email Message 2606 1 0 1
OpenOffice Impress Presentation / Template 2612 2 0 2
Applixware Graphic Image 2696 2 0 2
MS Word for Windows Document (pre-OLE) 2738 197 0 197
Adobe Acrobat Forms Data Format 2805 2 0 2
LDAP Data Interchange Format 2814 1 0 1
HyperText Markup Language (UTF-16BE) 2829 1 0 1
HyperText Markup Language (UTF-16LE) 2830 27 0 27
HyperText Markup Language (UTF-8) 2831 27 0 27
MS InfoPath Document (XML) 2843 1 0 1
Windows Policy Template 2852 1 0 1
Affix File 2855 14 0 14
NetCDF CDL Metadata 2950 3 0 3
Logger Pro Data 2970 133 0 133
MS Works Database 3 for Windows 3006 4 0 4
Digital Asset Exchange File 3038 1 0 1
Pretty Good Privacy Signed Message (ASCII) 3084 7 0 7
Pretty Good Privacy Public Key Block (ASCII) 3085 1 0 1
Linux Journalled Flash File System Image (JFFS,Intel) 3139 5 0 5
dBase II Database 3157 1 0 1
3D Systems Stereolithography CAD Image (Binary) 3173 2 0 2
CGNS Advanced Data Format Database 3225 18 0 18
ACE/gr Parameter Data (ASCII) 3240 3 0 3
Palm OS Application 3278 2 0 2
Palm OS Dynamic Library 3295 1 0 1
Mobipocket eBook 3297 1 0 1
MS Rich Text Format Document (Mac) 3299 58 0 58
Wyko Vision Dataset (ASCII) 3315 14 0 14
Google Earth Keyhole Markup Langage (Compressed) 3316 660 0 660
MS FrontPage Document (XML) 3317 81 0 81
Netscape Browser Bookmarks 3318 6 0 6
Web Script Source Code File 3319 17 0 17
Tgif Drawing 3346 6 0 6
Apple Property List 3410 2 0 2
ArcInfo Coverage Export 3466 101 0 101
Earth Resource Mapping Satellite Image Header 3488 1 0 1
--------------------------------------------------------------- ----- ---------- ---------- ----------
Hi there!
Thanks for the report. Is the scan time really in ms with some strange formatting or is it hh:mm:ss?
Thanks!
I don’t know how to fix the CSS width. Sorry.