Home > Disk Images > test disk image of emails available

test disk image of emails available

February 2nd, 2011 Leave a comment Go to comments

I have created a new disk image called 2010-nps-emails that can be used for testing programs that find email addresses or perform string search.

The disk image consists of 30 different email addresses, each one stored in a different document with a different coding scheme.

Below are a list of the email addresses and their codings:

email address                             Application (Encoding)

plain_text@textedit.com                   Apple TextEdit  (UTF-8)
plain_text_pdf@textedit.com               Apple TextEdit print-to-PDF (/FlateDecode)
rtf_text@textedit.com                     Apple TextEdit (RTF)
rtf_text_pdf@textedit.com                 Apple TextEdit print-to-PDF (/FlateDecode)
plain_utf16@textedit.com                  Apple TextEdit (UTF-16)
plain_utf16_pdf@textedit.com              Apple TextEdit print-to-PDF (/FlateDecode)

pages@iwork09.com                         Apple Pages '09
pages_comment@iwork09.com                 Apple Pages (comment) '09
keynote@iwork09.com                       Apple Keynote '09
keynote_comment@iwork09.com               Apple Keynote '09 (comment)
numbers@iwork09.com                       Apple Numbers '09
numbers_comment@iwork09.com               Apple Numbers '09 (comment)

user_doc@microsoftword.com                Microsoft Word 2008 (Mac) (.doc file)
user_doc_pdf@microsoftword.com            Microsoft Word 2008 (Mac) print-to-PDF
user_docx@microsoftword.com
user_docx_pdf@microsoftword.com           Microsoft Word 2008 (Mac) print-to-PDF (.docx file)
xls_cell@microsoft_excel.com
xls_comment@microsoft_excel.com           Microsoft Word 2008 (Mac)
xlsx_cell@microsoft_excel.com             Microsoft Word 2008 (Mac)
xlsx_comment@microsoft_excel.com          Microsoft Word 2008 (Mac) (Comment)

doc_within_doc@document.com               Microsoft Word 2007 (OLE .doc file within .doc)
docx_within_docx@document.com             Microsoft Word 2007 (OLE .doc file within .doc)
ppt_within_doc@document.com               Microsoft PowerPoint and Word 2007 (OLE .ppt file within .doc)
pptx_within_docx@document.com             Microsoft PowerPoint and Word 2007 (OLE .pptx file within .docx)
xls_within_doc@document.com               Microsoft Excel and Word 2007 (OLE .xls file within .doc)
xlsx_within_docx@document.com             Microsoft Excel and Word 2007 (OLE .xlsx file within .docx)

email_in_zip@zipfile1.com                 text file within ZIP
email_in_zip_zip@zipfile2.com             ZIP'ed text file, ZIP'ed
email_in_gzip@gzipfile.com                text file within GZIP
email_in_gzip_gzip@gzipfile.com           GZIP'ed text file, GZIP'ed

The image can be downloaded from http://downloads.digitalcorpora.org/corpora/disk-images/nps-2010-emails/

Edit, 2011-11-26 19:32 PST: One email was incorrectly recorded above. xlsx_comment@microsoft_excel.com is within the disk image, but xlsx_cell_comment@microsoft_excel.com was recorded here. That is now corrected above.

Categories: Disk Images Tags:
  1. Cesar
    February 15th, 2011 at 18:35 | #1

    Link sends me to a 404 error page.

  2. April 5th, 2012 at 21:44 | #2

    The link is fixed. Thanks.

    @Cesar

  3. Tobin Craig
    October 10th, 2017 at 16:39 | #3

    The link points to a download forbidden error page.

  4. October 10th, 2017 at 18:57 | #4

    @admin
    The link is fixed again!

  1. No trackbacks yet.