Search code examples
pythonpandascgi

Pandas Dataframe as Excel via CGI


I would like to return a Pandas DataFrame in a CGI script as an Excel file based on a query. However, Excel cannot open the file, reports that the file is corrupted.

The transmitted file seems to be a byte string. The beginning of the file looks like this:

b'PK\x03\x04\x14\x00\x00\x00\x08\x00\x00\x00?\x00\xf8J\xa7!Q\x01\x00\x00\x90\x04\x00\x00\x13\x00\x00\x00[Content_Types].xml\xad\x94\xcbN\xc30\x10E\xf7H\xfcC\xe4-J\xdc\xb2@\x085\xed\x82\xc7\x12*Q>\xc0\xd8\x93\xc6\xaac[\x9eii\xff\x9eI\xf

The following code leads to the result:

def get_excel(df: pd.DataFrame):
    xlsdata = BytesIO()
    writer = pd.ExcelWriter(xlsdata)
    data_df.to_excel(writer, index=False, sheet_name='Sheet1')
    writer.close()
    xlsdata.seek(0)
    return xlsdata

def send_excel_response(excel_file: BytesIO, filename: str):
    print("Content-type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet")
    print("Content-Disposition: attachment; filename={}.xlsx".format(filename))
    print("Content-Length: {}".format(excel_file.getbuffer().nbytes))
    print()
    excel_bytes = excel_file.getvalue()
    print(excel_bytes)

When I look at the output xlsdata.getvalue() in the debugger, the ByteString is also included.

I have already tried to save the BytesIO object without the CGI script. The file can be opened by Excel.

    with open("test.xlsx", 'wb')  as data_file:
        data_file.write(xlsdata.getvalue())
        data_file.close()

The content looks different:

504b 0304 1400 0000 0800 0000 3f00 f84a
a721 5101 0000 9004 0000 1300 0000 5b43

Does anyone have an idea how to solve the problem?

Best h0mebrewer


Solution

  • You're printing bytes using print which will not print the raw bytes but its string representation:

    >>> print(b'\x10\x11')
    b'\x10\x11'
    

    Instead, use sys.stdout.buffer.write() to write raw bytes to stdout. You'll also need to flush stdout before so that the print calls done before are flushed.

    So, replace:

    print(excel_bytes)
    

    with:

    sys.stdout.flush()
    sys.stdout.buffer.write(excel_bytes)