I run a script 4 times a day that uses the requests module to download a file, which I then throw into a database. 9 times out of 10, the script works flawlessly. But the times it does not work is because of a character in the downloaded file that my script, as it is, does not like. For example, here's the error I got today: UnicodeEncodeError: 'ascii' codec can't encode characters in position 379-381: ordinal not in range(128)
. I downloaded the file another way and here's the character at position 380 which I believe is responsible for stopping my script, "∞". And, here's the place in my script where it chokes:
##### request file
r = requests.get('https://resources.example.com/requested_file.csv')
##### create the database importable csv file
ld = open('/requested_file.csv', 'w')
print(r.text, file=ld)
I know this probably has to do with encoding the file somehow before printing it to the .csv file, and is probably a simple thing for someone who knows what they are doing but, after many hours of research, I'm about to cry. Thanks for your help in advance!
I tried a lot of different things but here's what ended up working for me:
import requests
import io
##### request file
r = requests.get('https://resources.example.com/requested_file.csv')
##### create the db importable csv file
with open('requested_file_TEMP.csv', 'wb') as ld:
ld.write(r.text.encode())
ld.close()
##### run the temp file through the following code to get rid of any non-ascii characters
##### in the file; non-ascii characters can/will cause the script to choke
with io.open('requested_file_TEMP.csv', 'r',encoding='utf-8',errors='ignore') as infile, \
io.open('requested_file_TEMP.csv', 'w',encoding='ascii',errors='ignore') as outfile:
for line in infile:
print(*line.split(), file=outfile)
infile.close
outfile.close