Search code examples

web2py : downloading zip file created by cStringIO text strings in controller

I have a web2py based portal where a group of annotators provide text labels and bounding-boxes related information for the various images shown to them. I'd like to get this information in the form of XML files (one-per-image) and add a download-annotations functionality to the portal which serves a ZIP file containing all these XML files. I am able to create a zip-download from the portal but unzipping the same throws the following error :

mohit@nightfury13:~/Downloads$ unzip Arabic\    
Archive:  Arabic
      End-of-central-directory signature not found.  Either this file is not
      a zipfile, or it constitutes one disk of a multi-part archive.  In the
      latter case the central directory and zipfile comment will be found on
      the last disk(s) of this archive.
    unzip:  cannot find zipfile directory in one of Arabic or
            Arabic, and cannot find Arabic, period.

Following is the code I've written to perform this task. Can someone point out what am I doing wrong? - First half of the code is preparation of the dataset-XML-string (you may skip that). Second half is the zip pipeline.

def prepare_dataset():
    import os
    from PIL import Image
    import zipfile, cStringIO

    # Check if a valid data-id was passed.
    if not request.vars.data_id:
        session.flash = 'No dataset selected for download'
        redirect(URL('default', 'select_db?redirect=1'))

    # Create the annotation-data in a proper format.
        data_id = int(request.vars.data_id)
        dataset = db([0]
        root_path = dataset['data_path'].split('cropped')[0]
        root_images = [i for i in os.listdir(root_path) if i.endswith('.jpg') or i.endswith('.jpeg') or i.endswith('.png')]
        content = {}
        imgs_data = db((db.Images.data_id==data_id)&(
        for img_data in imgs_data:
            label = img_data['FinalLabels']['label']
            if 'bad' not in [i.lower() for i in label.split()]:
                img_name = img_data['Images']['img_name']
                root_img_name = img_name.split('_')[0]
                xmin, ymin, xmax, ymax = img_name.split('.')[0].split('_')[2:]

                if root_img_name not in content:
                    r_im_name = [i_name for i_name in root_images if root_img_name in i_name][0] # This one also has the extension
                    root_im =, r_im_name))
                    r_depth = 3
                    if not root_im.mode=='RGB':
                        r_depth = 1
                    r_width, r_height = root_im.size

                    content[root_img_name] = {'name':r_im_name, 'depth':r_depth, 'width':r_width, 'height':r_height, 'crops':[{'label':label, 'xmin':xmin, 'ymin':ymin, 'xmax':xmax, 'ymax':ymax}]}
                    content[root_img_name]['crops'].append({'label':label, 'xmin':xmin, 'ymin':ymin, 'xmax':xmax, 'ymax':ymax})

        # Compress img-annotation data (content) to zip and export
        zip_chunks = cStringIO.StringIO()
        zipf = zipfile.ZipFile(zip_chunks, "w", compression=zipfile.ZIP_DEFLATED)

        for im_name in content:
            root_im = content[im_name]
            root_folder = filter(None, root_path.split('/'))[-1]
            xml_str = """<annotation>
    <segmented>0</segmented>""" % (root_folder, im_name, root_im['name'], root_im['width'], root_im['height'], root_im['depth'])

            for crop in root_im['crops']:
    </object>""" % (crop['label'], crop['xmin'], crop['ymin'], crop['xmax'], crop['ymax'])

            zipf.writestr(im_name+'.xml', xml_str)

        zip_name = dataset['data_name']+''
        file_header = 'attachment; filename='+zip_name
        response.headers['Content-Type'] = 'application/zip'
        response.headers['Content-Disposition'] = file_header
        return zipf


  • ZipFile is writing the data to the zip_chunks StringIO object, so you must return zip_chunks.getvalue(), not zipf.