I have posted a text file containing MD5 hashes for the first 512 bytes and first 4096 bytes of every file in the GOVDOCS1 corpus. This file is intended for research on sector hashing. You can download the file from https://downloads.digitalcorpora.org/corpora/files/govdocs1/govdocs1-first512-first4096-docid.txt