Search code examples
bashunzip

Can't curl then unzip zip file


I'm just trying to curl this zip file, then unzip it

curl -sS https://www.kaggle.com/c/word2vec-nlp-tutorial/download/labeledTrainData.tsv.zip > labeledTrainData.tsv.zip
unzip labeledTrainData.tsv.zip labeledTrainData.tsv

but I keep getting the error;

Archive:  labeledTrainData.tsv.zip
End-of-central-directory signature not found.  Either this file is not
a zipfile, or it constitutes one disk of a multi-part archive.  In the
latter case the central directory and zipfile comment will be found on
the last disk(s) of this archive.

I'm using the same syntax as found in this response, I think. Is there something wrong with the file I'm downloading? I feel like I'm making a noob mistake. I run these two commands in a shell script


Solution

  • I am able to replicate your error. This sort of error generally indicates one of two things:

    1. The file was not packaged properly
    2. You aren't downloading what you think you're downloading.

    In this case, your problem is the latter. It looks like you're downloading the file from the wrong URL. I'm seeing this when I open up the alleged zip file for reading:

    <html><head><title>Object moved</title></head><body>
    <h2>Object moved to <a href="/account/login?ReturnUrl=%2fc%2fword2vec-nlp-tutorial%2fdownload%2flabeledTrainData.tsv.zip">here</a>.</h2>
    </body></html>
    

    Long story short, you need to download from the alternate URL specified above. Additionally, Kaggle usually requires login credentials when downloading, so you'll need to specify your username/password as well.