DigitalCorpora is now available on Amazon S3

You can download any file using this new base URL: https://downlods.digitalcorpora.org/corpora/

For example, corpus file drives/nps-2014-xbox1/narrative.txt can be downloaded from:
https://downloads.digitalcorpora.org/corpora/drives/nps-2014-xbox1/narrative.txt

You can browse the DigitalCorpora S3 bucket using our browser at https://downloads.digitalcorpora.org/corpora/.

You can also use our JavaScript browser: https://digitalcorpora.org/s3_browser.html

DigitalCorpora has joined the AWS Open Data Sponsorship Program!

The DigitalCorpora project recently joined the AWS Open Data Sponsorship Program. This means that big changes are underway, and they are all for the better!

Under the Open Data program, we will be transitioning the digitalcorpora corpus to storage in Amazon S3. We will have a gateway to preserve the existing links at http://downloads.digitalcorpora.org/corpora/, but the gateway will redirect you to an AWS S3 download. We will also be adding TTS for downloads.

This means that the download server will be more stable moving forward. It will also make it much easier (and faster) to analyze the corpus from the Amazon Cloud—and all forensics should be done in the cloud, right?

We’ve been hosting the downloads server at George Mason University since 2015 and it’s been a great ride! We will probably keep the GMU downloads server operational as our backup downloads server for the next year or two.

As preparation for the migration, we will be removing all symbolic links from the downloads server. Currently, the following symbolic links are in use:

dcorpora@digitalcorpora:~/corpora$ ls -l `find . -type l -print`
lrwxrwxrwx 2 dcorpora apache  6 May 16  2015 ./cell-phones -> mobile
lrwxrwxrwx 2 dcorpora apache  6 May 16  2015 ./disk-images -> drives
lrwxrwxrwx 1 dcorpora apache 22 Aug 19  2019 ./mobile/2019-owl -> ../scenarios/2019-owl/
lrwxrwxrwx 2 dcorpora apache  7 May 16  2015 ./network-packet-dumps -> packets
lrwxrwxrwx 1 dcorpora apache 33 Jul 18 09:16 ./scenarios/2008-m57-jean -> ../disk-images/nps-2008-m57-jean/
lrwxrwxrwx 1 dcorpora apache 23 Aug 18  2019 ./scenarios/2008-nitroba -> ../packets/2008-nitroba
lrwxrwxrwx 2 dcorpora apache 42 May 17  2015 ./scenarios/2009-m57-patents/drives-redacted -> ../../drives/nps-2009-m57-patents-redacted
lrwxrwxrwx 2 dcorpora apache 30 May 17  2015 ./scenarios/2009-m57-patents/net -> ../../packets/2009-m57-patents
lrwxrwxrwx 2 dcorpora apache 30 May 17  2015 ./scenarios/2009-m57-patents/ram -> ../../ram/nps-2009-m57-patents
lrwxrwxrwx 2 dcorpora apache 33 May 17  2015 ./scenarios/2009-m57-patents/usb -> ../../drives/nps-2009-patents/usb
int dcorpora@digitalcorpora:~/corpora$

This will be resolved as followings:

  • The cell-phones, disk-images and network-packet-dumps links will be removed.
  • The proper home for 2019-owl will be under scenarios/; a file 2019-owl.txt will be left behind in the /mobile directory.
  • All of the m57 data will be moved under the scenario. As above, text files will be left behind