Bots downloading disk images

I’m preparing some statistics on who (and what) are downloading the disk images we have here at digitalcorpora.org. The first thing that I’ve done is suppress the bots that are, for whatever reason, downloading the images.

Here’s the bots that we’ve found, and the number of times each image has been downloaded by a bot.

  Rank     Count     Value(s):
  ============================
      1      2334      Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
      2       851      MLBot (www.metadatalabs.com/mlbot)
      3       811      SAMSUNG-SGH-E250/1.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 UP.Browser/6.2.3.3.c.1.101 (GUI) MMP/2.0 (compatible; Googlebot-Mobile/2.1; +http://www.google.com/bot.html)
      4       749      Mozilla/5.0 (compatible; DotBot/1.1; http://www.dotnetdotcom.org/, crawler@dotnetdotcom.org)
      5       492      Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)
      6       130      Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)
      7       115      Mozilla/5.0 (compatible; DBLBot/1.0; +http://www.dontbuylists.com/)
      8       109      msnbot/2.0b (+http://search.msn.com/msnbot.htm)
      9       108      Mozilla/5.0 (compatible; SiteBot/0.1; +http://www.sitebot.org/robot/)
     10        89      CCBot/1.0 (+http://www.commoncrawl.org/bot.html)
     11        87      Mozilla/5.0 (Twiceler-0.9 http://www.cuil.com/twiceler/robot.html)
     12        78      TwengaBot-Discover (http://www.twenga.fr/bot-discover.html)
     13        58      Mozilla/5.0 (compatible; Purebot/1.1; +http://www.puritysearch.net/)
     14        51      msnbot/1.1 (+http://search.msn.com/msnbot.htm)
     15        26      Mozilla/5.0 (compatible; MJ12bot/v1.3.2; http://www.majestic12.co.uk/bot.php?+)
     16        21      Cityreview Robot (+http://www.cityreview.org/crawler/)
     17        18      'citeseerxbot'
     18        15      SindiceBot (heritrix/2.0.2 +http://sindice.com/developers/bot)
     19        12      Mozilla/5.0 (compatible; MJ12bot/v1.3.1; http://www.majestic12.co.uk/bot.php?+)
     20        11      Mozilla/5.0 (compatible; discobot/1.1; +http://discoveryengine.com/discobot.html
     21         9      Mozilla/5.0 (compatible; Exabot/3.0; +http://www.exabot.com/go/robot)
     22         7      CatchBot/3.0; +http://www.catchbot.com
                7      CyberPatrol SiteCat Webbot (http://www.cyberpatrol.com/cyberpatrolcrawler.asp)
                7      yacybot (amd64 Linux 2.6.26-2-xen-amd64; java 1.6.0_20; Europe/en) http://yacy.net/bot.html
     25         6      Mozilla/5.0 (compatible; Search17Bot/1.1; http://www.search17.com/bot.php
                6      yacybot (amd64 Linux 2.6.26-2-xen-amd64; java 1.6.0_20; Europe/de) http://yacy.net/bot.html
     27         5      MSRBOT (http://research.microsoft.com/research/sv/msrbot/)
                5      yacybot (amd64 Linux 2.6.31-20-generic; java 1.6.0_15; Europe/en) http://yacy.net/bot.html
                5      yacybot (i386 Linux 2.6.32-trunk-686; java 1.6.0_18; America/en) http://yacy.net/bot.html
     30         3      msnbot-media/1.1 (+http://search.msn.com/msnbot.htm)
     31         2      Mozilla/5.0 (SnapPreviewBot) Gecko/20061206 Firefox/1.5.0.9
                2      yacybot (amd64 Linux 2.6.26-2-amd64; java 1.6.0_20; Europe/en) http://yacy.net/bot.html
                2      yacybot (amd64 Linux 2.6.28-18-generic; java 1.6.0_19; GMT/en) http://yacy.net/bot.html
                2      yacybot (i386 Linux 2.6.31-21-generic; java 1.6.0_0; Europe/en) http://yacy.net/bot.html
     35         1      Mozilla/5.0 (compatible; Googlebot/2.1;  http://www.google.com/bot.html
                1      Mozilla/5.0 (compatible; discobot/1.1; +http://discoveryengine.com/discobot.html)
                1      findfiles.net/0.96 (Robot;test_robot@gmx-topmail.de)
                1      librabot/1.0 (+http://search.msn.com/msnbot.htm)
                1      yacybot (amd64 Linux 2.6.18-164.11.1.el5xen; java 1.6.0; Europe/en) http://yacy.net/bot.html
                1      yacybot (amd64 Linux 2.6.18-164.15.1.el5; java 1.6.0_14; Europe/de) http://yacy.net/bot.html
                1      yacybot (x86 Windows XP 5.1; java 1.6.0_18; Europe/de) http://yacy.net/bot.html
                1      yacybot (x86 Windows XP 5.1; java 1.6.0_20; Europe/de) http://yacy.net/bot.html
                1      yacybot (x86_64 Mac OS X 10.6.4; java 1.6.0_20; America/en) http://yacy.net/bot.html 

Total items printed: 6242