I'm working on a Flask backend using REST API that generates a csv report. The data in the report can contain special characters, which because of Excel's weirdness means that you need to encode the csv with utf-8-sig
encoding (otherwise Excel uses cp1252
(?) encoding and special characters aren't displayed properly).
I'm sure I generate the csv correctly since if I write it to a local file using Python it opens in Excel as expected. Same goes for zipping it (taking a BytesIO
object and creating a new BytesIO
object out of it using Python's zipfile
library) and providing it to the client as a zip file.
However, if I return the BytesIO
object directly using Flask's send_file()
function, the BOM is lost and special chars aren't displayed properly in Excel. I'm not sure if it's caused by the send_file
itself (I've tried drilling down in the implementation but I didn't see anything that could cause this) or by the browser (it happens with both Google Chrome and Firefox), but it's an issue.
Do you have any ideas how to work around this? I would like to provide a csv file that can be opened in Excel by double clicking it and it displays all of the characters properly. The code in question is provided below. Thanks!
response.seek(0)
filename = f'report {datetime.now().strftime("%Y-%m-%d %H-%M-%S")}'
response = BytesIO(response.read().encode("utf-8-sig")) # add BOM to make Excel happy
response.seek(0)
return send_file(response, download_name=f'{filename}.csv', as_attachment=True, mimetype="content-type: text/csv; charset=utf-8")
EDIT: Using Postman I have determined that the BOM is scrapped by the browsers. The file is provided correctly by the backend and is then modified while being saved by the browser.
I found a solution! You need to include the BOM characters twice since the first set is removed by the browsers because that's how the spec specifies the UTF8 encoded file should be processed... This technically makes your csv file invalid, however, it is Excel compatible after being downloaded through a browser.
More info: https://stackoverflow.com/a/42717677/9682679
Code modified as described above:
response = BytesIO(b"\xEF\xBB\xBF" + response.read().encode("utf-8-sig")) # add BOM twice to fool the browser and make Excel happy
EDIT: Another workaround was provided by @PanagiotisKanavos. You can also use octet-stream
mime type instead of text/csv
. In that case the web browser has no idea it's working with a text file and therefore doesn't touch it. The resulting code reads like this:
return send_file(response, download_name=f'{filename}.csv', as_attachment=True, mimetype="application/octet-stream")