I would like to return a Pandas DataFrame in a CGI script as an Excel file based on a query. However, Excel cannot open the file, reports that the file is corrupted.
The transmitted file seems to be a byte string. The beginning of the file looks like this:
b'PK\x03\x04\x14\x00\x00\x00\x08\x00\x00\x00?\x00\xf8J\xa7!Q\x01\x00\x00\x90\x04\x00\x00\x13\x00\x00\x00[Content_Types].xml\xad\x94\xcbN\xc30\x10E\xf7H\xfcC\xe4-J\xdc\xb2@\x085\xed\x82\xc7\x12*Q>\xc0\xd8\x93\xc6\xaac[\x9eii\xff\x9eI\xf
The following code leads to the result:
def get_excel(df: pd.DataFrame):
xlsdata = BytesIO()
writer = pd.ExcelWriter(xlsdata)
data_df.to_excel(writer, index=False, sheet_name='Sheet1')
writer.close()
xlsdata.seek(0)
return xlsdata
def send_excel_response(excel_file: BytesIO, filename: str):
print("Content-type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet")
print("Content-Disposition: attachment; filename={}.xlsx".format(filename))
print("Content-Length: {}".format(excel_file.getbuffer().nbytes))
print()
excel_bytes = excel_file.getvalue()
print(excel_bytes)
When I look at the output xlsdata.getvalue() in the debugger, the ByteString is also included.
I have already tried to save the BytesIO object without the CGI script. The file can be opened by Excel.
with open("test.xlsx", 'wb') as data_file:
data_file.write(xlsdata.getvalue())
data_file.close()
The content looks different:
504b 0304 1400 0000 0800 0000 3f00 f84a
a721 5101 0000 9004 0000 1300 0000 5b43
Does anyone have an idea how to solve the problem?
Best h0mebrewer
You're printing bytes using print
which will not print the raw bytes but its string representation:
>>> print(b'\x10\x11')
b'\x10\x11'
Instead, use sys.stdout.buffer.write()
to write raw bytes to stdout. You'll also need to flush stdout before so that the print
calls done before are flushed.
So, replace:
print(excel_bytes)
with:
sys.stdout.flush()
sys.stdout.buffer.write(excel_bytes)