Previously I managed to solve problems with ASCII vs UTF-8 encoding using the following code.
import sys
reload(sys)
sys.setdefaultencoding('utf8')`
or sometimes this was enough:
html = html.decode("utf-8")
The difference now, is that in one of my regex functions I am using 'ß' directly in my code (before it was all in my data / variables). And the program crashes even if I comment the part with 'ß' out.
SyntaxError: Non-ASCII character '\xc3' in file bla/bla/bla.py on line 75, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details
the folowing line is causing the problem:
def adjust_city_name(name):
matchesfound = re.search('((Stadt|Große Kreisstadt)\s)?(.*)', name, re.IGNORECASE)
What could be some possible ways to overcome this problem?
full traceback:
Traceback (most recent call last):
File "bla/bla/crwl.py", line 2, in <module>
from linkParser import *
File "bla/bla/linkParser.py", line 2, in <module>
from helpFunctions import *
File "bla/bla/helpFunctions.py", line 75
SyntaxError: Non-ASCII character '\xc3' in file bla/bla/helpFunctions.py on line 75, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details
You need to add encoding to the top of your file:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
You can read more about it here.