I am having the problem in the following function:
def download_zip(key: str):
response = requests.get('some url')
if response.status_code == 200:
header = response.headers.get('content-disposition')
if type(header) == str:
file_name = re.findall("filename=(.+)", header)[0][1:-1]
else:
file_name = f"{key}.zip"
with open(file_name, mode='wb') as f:
f.write(response.content)
print(f"Written: {file_name}")
else:
print(f"Failed: {key} -> {response.status_code}")
The URL 'some url' points to a zip file.
When downloading the file with a browser, its name contains japanese characters which are preserved:
音のないレプリカ
.
With my code, they are not and instead produce something like this:
é³ã®ãªãã¬ããªã«
. How can I make it preserve those characters?
Case-specific solution: fix the encoding to what (you think) it should be:
file_name.encode('iso-8859-1').decode('utf-8')
More general solution: yell at the owner of the webserver to fix their headers, then properly set the filename*
field of Content-Disposition
. Then you could have some confidence as to what the correct encoding was.