Search code examples
pythonencodingutf-8non-ascii-characters

Crash when using 'scharfes s' or 'ß' in python


Previously I managed to solve problems with ASCII vs UTF-8 encoding using the following code.

    import sys
    reload(sys)
    sys.setdefaultencoding('utf8')`

or sometimes this was enough:

    html = html.decode("utf-8")

The difference now, is that in one of my regex functions I am using 'ß' directly in my code (before it was all in my data / variables). And the program crashes even if I comment the part with 'ß' out.

    SyntaxError: Non-ASCII character '\xc3' in file bla/bla/bla.py on line 75, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details

the folowing line is causing the problem:

    def adjust_city_name(name):
        matchesfound = re.search('((Stadt|Große Kreisstadt)\s)?(.*)', name, re.IGNORECASE)

What could be some possible ways to overcome this problem?

full traceback:

    Traceback (most recent call last):
     File "bla/bla/crwl.py", line 2, in <module>
    from linkParser import *
    File "bla/bla/linkParser.py", line 2, in <module>
    from helpFunctions import *
    File "bla/bla/helpFunctions.py", line 75
    SyntaxError: Non-ASCII character '\xc3' in file bla/bla/helpFunctions.py on line 75, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details

Solution

  • You need to add encoding to the top of your file:

    #!/usr/bin/env python
    # -*- coding: utf-8 -*-
    

    You can read more about it here.