I am using elasticsearch with python 2.7. I have to analyze and store a lot of text; often appears the following error: SyntaxError: Non-ASCII character ... I wrote the following code for testing:
import elasticsearch
es = elasticsearch.Elasticsearch("127.0.0.1:9200")
test = 'sarà'
doc = {
'ID':'123456',
'field':unicode(test, errors='ignore'),
}
es.index('test_db','test',doc)
The error is:
SyntaxError: Non-ASCII character '\xc3' in file /home/user/PycharmProjects/ubuntu/asciiTest.py on line 4, but no encoding declared.
After reading other answer on stackoverflow I tried:
unicode(test, errors='ignore')
But again the same error. I don't know how to manage these special characters.
Python is telling you that you're used non-ASCII characters but haven't declared the formating of the source code.
The error is usually accompanied by a message referring you to PEP-263 - https://www.python.org/dev/peps/pep-0263/
You can simply add the following to the top of your source code:
# coding=<encoding name>
<encoding name>
is the encoding you've used for the source code. It's advisable to use utf-8.
When using non-ASCII, you ought to use Unicode strings. You can achieve this by simply appending a u
in front of the string value.
E.g.
test = u'sarà'
Remove all uses of unicode()
. IMHO, unicode()
should never been used without an encoding given