We have transitioned http://downloads.digitalcorpora.org/corpora/ from George Mason University to Amazon S3. The transition should be completed by 0445 GMT Feb 3, 2021. When the transition is finished, you will also be able to access digitalcorpora at https://downloads.digitalcorpora.org/corpora/.
The old downloads system can still be accessed at http://gmu.digitalcorpora.org/corpora/
We are moving to a new corpora browser that is running on the digitalcorpora web server, scans through the AWS S3 bucket, and serves up AWS S3 https:// links.
You can try the new browser here: https://app.digitalcorpora.org/corpora/
Please submit comments and requests here: https://github.com/digitalcorpora/app/issues.
November 23rd, 2020
admin
You can download any file using this new base URL: http://digitalcorpora.s3-website.us-west-2.amazonaws.com/corpora/
For example, corpus file drives/nps-2014-xbox1/narrative.txt can be downloaded from:
http://digitalcorpora.s3-website.us-west-2.amazonaws.com/corpora/drives/nps-2014-xbox1/narrative.txt
You can browser the DigitalCorpora S3 bucket here: https://digitalcorpora.org/s3_browser.html
November 21st, 2020
admin
The DigitalCorpora project recently joined the AWS Open Data Sponsorship Program. This means that big changes are underway, and they are all for the better!
Under the Open Data program, we will be transitioning the digitalcorpora corpus to storage in Amazon S3. We will have a gateway to preserve the existing links at http://downloads.digitalcorpora.org/corpora/, but the gateway will redirect you to an AWS S3 download. We will also be adding TTS for downloads.
This means that the download server will be more stable moving forward. It will also make it much easier (and faster) to analyze the corpus from the Amazon Cloud—and all forensics should be done in the cloud, right?
We’ve been hosting the downloads server at George Mason University since 2015 and it’s been a great ride! We will probably keep the GMU downloads server operational as our backup downloads server for the next year or two.
As preparation for the migration, we will be removing all symbolic links from the downloads server. Currently, the following symbolic links are in use:
dcorpora@digitalcorpora:~/corpora$ ls -l `find . -type l -print`
lrwxrwxrwx 2 dcorpora apache 6 May 16 2015 ./cell-phones -> mobile
lrwxrwxrwx 2 dcorpora apache 6 May 16 2015 ./disk-images -> drives
lrwxrwxrwx 1 dcorpora apache 22 Aug 19 2019 ./mobile/2019-owl -> ../scenarios/2019-owl/
lrwxrwxrwx 2 dcorpora apache 7 May 16 2015 ./network-packet-dumps -> packets
lrwxrwxrwx 1 dcorpora apache 33 Jul 18 09:16 ./scenarios/2008-m57-jean -> ../disk-images/nps-2008-m57-jean/
lrwxrwxrwx 1 dcorpora apache 23 Aug 18 2019 ./scenarios/2008-nitroba -> ../packets/2008-nitroba
lrwxrwxrwx 2 dcorpora apache 42 May 17 2015 ./scenarios/2009-m57-patents/drives-redacted -> ../../drives/nps-2009-m57-patents-redacted
lrwxrwxrwx 2 dcorpora apache 30 May 17 2015 ./scenarios/2009-m57-patents/net -> ../../packets/2009-m57-patents
lrwxrwxrwx 2 dcorpora apache 30 May 17 2015 ./scenarios/2009-m57-patents/ram -> ../../ram/nps-2009-m57-patents
lrwxrwxrwx 2 dcorpora apache 33 May 17 2015 ./scenarios/2009-m57-patents/usb -> ../../drives/nps-2009-patents/usb
int dcorpora@digitalcorpora:~/corpora$
This will be resolved as followings:
- The cell-phones, disk-images and network-packet-dumps links will be removed.
- The proper home for 2019-owl will be under scenarios/; a file 2019-owl.txt will be left behind in the /mobile directory.
- All of the m57 data will be moved under the scenario. As above, text files will be left behind
A new set of iOS 13 images and files have been contributed to the collection by Joshua Hickman. Details can be found here (13.3.1) and here (13.4.1)
I am pleased to announce that a new scenario has been added to the digitalcorpora.org database! This scenario was created in collaboration with Paul Bryant of the Wellington Institute of Technology.
The details of this scenario can be found here.
A new corpus has been added to the collection by Sven Schmitt and Sebastian Nemetz. A link to the details encompassing this corpus can be found here.
A new set of Android 10 images and files have been contributed to the collection by Joshua Hickman. Details can be found here.
September 23rd, 2019
admin
We have published a Terms of Use for the digitalcorpora website.
We have a new teacher’s guide and solution for the 2008-Nitroba scenario, thanks to work done at UNSW Canberra by Ajoy Ghosh.