Search code examples
python-2.7elasticsearchcharacter-encodingnon-ascii-characterspython-unicode

Python 2.7 - Elasticsearch - SyntaxError: Non-ASCII character '\xc3'


I am using elasticsearch with python 2.7. I have to analyze and store a lot of text; often appears the following error: SyntaxError: Non-ASCII character ... I wrote the following code for testing:

 import elasticsearch

 es = elasticsearch.Elasticsearch("127.0.0.1:9200")
 test = 'sarà'
 doc = {
 'ID':'123456',
 'field':unicode(test, errors='ignore'),
  }
 es.index('test_db','test',doc)

The error is:

SyntaxError: Non-ASCII character '\xc3' in file /home/user/PycharmProjects/ubuntu/asciiTest.py on line 4, but no encoding declared.

After reading other answer on stackoverflow I tried:

unicode(test, errors='ignore')

But again the same error. I don't know how to manage these special characters.


Solution

  • Python is telling you that you're used non-ASCII characters but haven't declared the formating of the source code.

    The error is usually accompanied by a message referring you to PEP-263 - https://www.python.org/dev/peps/pep-0263/

    You can simply add the following to the top of your source code:

    # coding=<encoding name>
    

    <encoding name> is the encoding you've used for the source code. It's advisable to use utf-8.

    When using non-ASCII, you ought to use Unicode strings. You can achieve this by simply appending a u in front of the string value.

    E.g.

    test = u'sarà'
    

    Remove all uses of unicode(). IMHO, unicode() should never been used without an encoding given