Search code examples
fasttext

crawl-300d-2M-subword.zip corrupted or cannot be downloaded


I am trying to use this fasttext model crawl-300d-2M-subword.zip from the official page onI my Windows machine, but the download fails by the last few Kb.

I managed to successfully download the zip file into my ubuntu server using wget, but the zipped file is corrupted whenever I try to unzip it. Example of what I am getting:

unzip crawl-300d-2M-subword.zip
Archive:  crawl-300d-2M-subword.zip
  inflating: crawl-300d-2M-subword.vec
  inflating: crawl-300d-2M-subword.bin   bad CRC ff925bde  (should be e9be08f7)

It is always the file crawl-300d-2M-subword.bin, which I am interested in, that has problems in te unzipping.

I tried the two ways many times but with no success. it seems to me no one had this issue before


Solution

  • I've just downloaded & unzipped that file with no errors, so the problem is likely unique to your system's configuration, tools, or its network-path to the download servers.

    One common problem that's sometimes not prominently reported by a tool like wget is a download that keeps ending early, resulting in a truncated local file.

    • Is the zip file you received exactly 681,808,098 bytes long? (That's what I get.)
    • What if you try another download tool instead, like curl? (Such a relay between different endpoints might not trigger the same problems.)

    Sometimes if repeated downloads keep failing in the same way, it's due to subtle misconfiguration bugs/corruption unique to the network path from your machine to the peer (download origin) machine.

    • Can you do a successful download of the zip file (of full size per above) to anywhere else?
    • Then, transfer from that secondary location to where you really want it?

    If you're having problems on both a Windows machine, and a Ubuntu server, are they both on the same local network, perhaps subject to the same network issues – either bugs, or policies that cut a particular long download short?