Search code examples
pythoncmis

Convert Excel zip file content to actual Excel file?


I am using cmis package available in python to download the document from FileNet repository. I am using getcontentstream method available in the package. However it returns content file that beings with 'Pk' and ends in 'PK'. when I googled I came to know it is excel zip package content. is there a way to save the content into an excel file. I should be able to open the downloaded excel. I am using below code. but getting byte-liked object is required not str. I noticed type of result is string.io.

# expport the result
result = testDoc.getContentStream()
outfile = open(sample.xlsx, 'wb')
outfile.write(result.read())
result.close()
outfile.close()

Solution

  • Hi there and welcome to stackoverflow. There are a few bits I noticed about your post.

    To answer the error code you are getting directly. You called the outfile FileStream to be in terms of binary, however the result.read() must be in Unicode string format which is why you are getting this error. You can try to encode it before passing it to the outfile.write() function (ex: outfile.write(result.read().encode())).

    You can also simply just write Unicode directly by:

    result = testDoc.getContentStream()
    result_text = result.read()
    
    from zipfile import ZipFile
    
    with ZipFile(filepath, 'w') as zf:
        zf.writestr('filename_that_is_zipped', result_text)
    

    Not I am not sure what you have in your ContentStream but note that a excel file is made up of xml files zipped up. The minimum file structure you need for an excel file is as follows:

    • _rels/.rels contains excel schemas
    • docProps/app.xml contains number of sheets and sheet names
    • docProps/core.xml boiler plate user info and date created
    • xl/workbook.xml contains sheet names rdId to workbook link
    • xl/worksheets/sheet1.xml (and more sheets in this folder) contains cell data for each sheet
    • xl/_rels/workbook.xml.rels contains sheet file locations within zipfile
    • xl/sharedStrings.xml if you have string only cell values
    • [Content_Types].xmlapplies schemas to file types

    I recently went through piecing together an excel file from scratch, if you want to see the code check out https://github.com/PydPiper/pylightxl