Search code examples
curlzipunzip

End-of-central-directory signature not found


I have created the zip file using linux zip command and uploaded it in my google drive. When I tried to download and unzip the zipped file using curl and unzip command (using a bash file), it gives me the following error.

Archive:  pretrained_models.zip
  End-of-central-directory signature not found.  Either this file is not
  a zipfile, or it constitutes one disk of a multi-part archive.  In the
  latter case the central directory and zipfile comment will be found on
  the last disk(s) of this archive.
unzip:  cannot find zipfile directory in one of pretrained_models.zip or
        pretrained_models.zip.zip, and cannot find pretrained_models.zip.ZIP, period.

Can anyone suggest any workaround to fix this issue?

In case if anyone wants to reproduce the error, I am sharing the .sh file.

#!/bin/bash
pretrained='https://drive.google.com/uc?export=download&id=0B8ZGlkqDw7hFSm1MQ2FDVTZCTjA' 
# download pretrained models.
curl -o pretrained_models.zip $pretrained
unzip pretrained_models.zip
rm pretrained_models.zip

The file is publicly shared. For sanity check, you can download it from here.

N.B. I have seen related posts in other community of SO and some of them suggested to use different file extension but I want to stick to zip file.


Solution

  • When the shared files on Google Drive is downloaded, it is necessary to change the download method by the file size. It was found that the boundary of file size when the method is changed is about 40MB.

    Modified scripts :

    1. File size < 40MB

    #!/bin/bash
    filename="pretrained_models.zip"
    fileid="0B8ZGlkqDw7hFSm1MQ2FDVTZCTjA"
    curl -L -o ${filename} "https://drive.google.com/uc?export=download&id=${fileid}"
    

    2. File size > 40MB

    When it tries to download the file with more than 40MB, Google says to download from following URL.

    <a id="uc-download-link" class="goog-inline-block jfk-button jfk-button-action" href="/uc?export=download&amp;confirm=####&amp;id=### file ID ###">download</a>
    

    Query included confirm=#### is important for downloading the files with large size. In order to retrieve the query from the HTML, it uses pup.

    #!/bin/bash
    filename="pretrained_models.zip"
    fileid="0B8ZGlkqDw7hFSm1MQ2FDVTZCTjA"
    query=`curl -c ./cookie.txt -s -L "https://drive.google.com/uc?export=download&id=${fileid}" | pup 'a#uc-download-link attr{href}' | sed -e 's/amp;//g'`
    curl -b ./cookie.txt -L -o ${filename} "https://drive.google.com${query}"