Search code examples
pythongoogle-colaboratoryunzipzip

Extract Google Drive multi zip from Google colab notebook


I have a sequence of multiple zip files: 'train.zip.001', 'train.zip.002', 'train.zip.003', 'train.zip.004' and 'train.zip.005' on Google Drive. They are 8GB each. I don't know how to extract them.

I tried:

    with Zipfile.ZipFile('train.zip','r') as zipob:
      zipob.extractall('train2')

    with Zipfile.ZipFile('train.zip.001','r') as zipob:
      zipob.extractall('train2')

They gave two different errors:


BadZipFile                                Traceback (most recent call last)
<ipython-input-32-ebacbe394be2> in <module>()
----> 1 with zipfile.ZipFile('train.zip','r') as zipob:
      2   zipob.extractall('train2')

1 frames
/usr/lib/python3.6/zipfile.py in _RealGetContents(self)
   1196             raise BadZipFile("File is not a zip file")
   1197         if not endrec:
-> 1198             raise BadZipFile("File is not a zip file")
   1199         if self.debug > 1:
   1200             print(endrec)

BadZipFile: File is not a zip file
!unzip train.zip.001
Archive:  train.zip.001
  End-of-central-directory signature not found.  Either this file is not
  a zipfile, or it constitutes one disk of a multi-part archive.  In the
  latter case the central directory and zipfile comment will be found on
  the last disk(s) of this archive.
unzip:  cannot find zipfile directory in one of train.zip.001 or
        train.zip.001.zip, and cannot find train.zip.001.ZIP, period.

Both do not work.


Solution

  • I have tried extract 64 7z file like 7z.001, 7z.002 file,... To solve it, I have used:

    !7z x "/content/drive/My Drive/GitHub/DATA/images.7z.001" -tsplit
    

    May be it is useful for you...