Search code examples
pythonunicodepython-requestsencode

Python script chokes on a downloaded file because of unicode encode error


I run a script 4 times a day that uses the requests module to download a file, which I then throw into a database. 9 times out of 10, the script works flawlessly. But the times it does not work is because of a character in the downloaded file that my script, as it is, does not like. For example, here's the error I got today: UnicodeEncodeError: 'ascii' codec can't encode characters in position 379-381: ordinal not in range(128). I downloaded the file another way and here's the character at position 380 which I believe is responsible for stopping my script, "∞". And, here's the place in my script where it chokes:

##### request file

r = requests.get('https://resources.example.com/requested_file.csv')

##### create the database importable csv file

ld = open('/requested_file.csv', 'w')
print(r.text, file=ld)

I know this probably has to do with encoding the file somehow before printing it to the .csv file, and is probably a simple thing for someone who knows what they are doing but, after many hours of research, I'm about to cry. Thanks for your help in advance!


Solution

  • I tried a lot of different things but here's what ended up working for me:

    import requests
    import io
    
    ##### request file
    
    r = requests.get('https://resources.example.com/requested_file.csv')
    
    ##### create the db importable csv file
    
    with open('requested_file_TEMP.csv', 'wb') as ld:
    ld.write(r.text.encode())
    ld.close()
    
    ##### run the temp file through the following code to get rid of any non-ascii characters
    ##### in the file; non-ascii characters can/will cause the script to choke
    
    with io.open('requested_file_TEMP.csv', 'r',encoding='utf-8',errors='ignore') as infile, \
    io.open('requested_file_TEMP.csv', 'w',encoding='ascii',errors='ignore') as outfile:
    for line in infile:
        print(*line.split(), file=outfile)
    infile.close
    outfile.close