The DigitalCorpora project recently joined the AWS Open Data Sponsorship Program. This means that big changes are underway, and they are all for the better!
Under the Open Data program, we will be transitioning the digitalcorpora corpus to storage in Amazon S3. We will have a gateway to preserve the existing links at http://downloads.digitalcorpora.org/corpora/, but the gateway will redirect you to an AWS S3 download. We will also be adding TTS for downloads.
This means that the download server will be more stable moving forward. It will also make it much easier (and faster) to analyze the corpus from the Amazon Cloud—and all forensics should be done in the cloud, right?
We’ve been hosting the downloads server at George Mason University since 2015 and it’s been a great ride! We will probably keep the GMU downloads server operational as our backup downloads server for the next year or two.
As preparation for the migration, we will be removing all symbolic links from the downloads server. Currently, the following symbolic links are in use:
dcorpora@digitalcorpora:~/corpora$ ls -l `find . -type l -print` lrwxrwxrwx 2 dcorpora apache 6 May 16 2015 ./cell-phones -> mobile lrwxrwxrwx 2 dcorpora apache 6 May 16 2015 ./disk-images -> drives lrwxrwxrwx 1 dcorpora apache 22 Aug 19 2019 ./mobile/2019-owl -> ../scenarios/2019-owl/ lrwxrwxrwx 2 dcorpora apache 7 May 16 2015 ./network-packet-dumps -> packets lrwxrwxrwx 1 dcorpora apache 33 Jul 18 09:16 ./scenarios/2008-m57-jean -> ../disk-images/nps-2008-m57-jean/ lrwxrwxrwx 1 dcorpora apache 23 Aug 18 2019 ./scenarios/2008-nitroba -> ../packets/2008-nitroba lrwxrwxrwx 2 dcorpora apache 42 May 17 2015 ./scenarios/2009-m57-patents/drives-redacted -> ../../drives/nps-2009-m57-patents-redacted lrwxrwxrwx 2 dcorpora apache 30 May 17 2015 ./scenarios/2009-m57-patents/net -> ../../packets/2009-m57-patents lrwxrwxrwx 2 dcorpora apache 30 May 17 2015 ./scenarios/2009-m57-patents/ram -> ../../ram/nps-2009-m57-patents lrwxrwxrwx 2 dcorpora apache 33 May 17 2015 ./scenarios/2009-m57-patents/usb -> ../../drives/nps-2009-patents/usb int dcorpora@digitalcorpora:~/corpora$
This will be resolved as followings:
- The cell-phones, disk-images and network-packet-dumps links will be removed.
- The proper home for 2019-owl will be under scenarios/; a file 2019-owl.txt will be left behind in the /mobile directory.
- All of the m57 data will be moved under the scenario. As above, text files will be left behind