Search code examples
pythoncsvurllib2

UnicodeEncodeError when creating .csv file


I am trying to create a .csv file with data that I have stored into a list from Twitter search API. I have saved the last 100 tweets with a keyword that I chose (in this case 'reddit') and I am trying to save each tweet into a cell in a .csv file. My code is below and I am returning an error that is:

UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 0: ordinal not in range(128)

If anyone knows what I can do to fix this it would be greatly appreciated!

import sys
import os


import urllib
import urllib2
import json
from pprint import pprint
import csv

import sentiment_analyzer

import codecs

class Twitter:
    def __init__(self):
        self.api_url = {}
        self.api_url['search'] = 'http://search.twitter.com/search.json?'

    def search(self, params):

        url = self.make_url(params, apitype='search')
        data = json.loads(urllib2.urlopen(url).read().decode('utf-8').encode('ascii',     'ignore'))

        txt = []
        for obj in data['results']:
            txt.append(obj['text'])

        return '\n'.join(txt)

    def make_url(self, params, apitype='search'):


        baseurl = self.api_url[apitype] 
        return baseurl + urllib.urlencode(params)


if __name__ == '__main__':
    try:
        query = sys.argv[1]
    except IndexError:
        query = 'reddit'

    t = Twitter()

    s = sentiment_analyzer.SentimentAnalyzer()

    params = {'q': query, 'result_type': 'recent', 'rpp': 100}

    urlName = t.make_url(params)
    print urlName
    txt = t.search(params)

    print s.analyze_text(txt)

    myfile = open('reddit.csv', 'wb')
    wr = csv.writer(myfile, quoting=csv.QUOTE_MINIMAL)
    wr.writerow(txt)

Solution

  • From the Python 2 documentation for the csv module:

    Note

    This version of the csv module doesn’t support Unicode input. Also, there are currently some issues regarding ASCII NUL characters. Accordingly, all input should be UTF-8 or printable ASCII to be safe; see the examples in section Examples.

    That said, you can probably parse the .csv file yourself without too much difficulty using Python's built-in Unicode string support -- there's also this answer.