Search code examples
pythonstringpython-2.7ascii

How to detect non-ASCII character in Python?


I'm parsing multiple XML files with Python 2.7, there are some strings like: string ="[2,3,13,37–41,43,44,46]". I split them to get a list of all elements, and then I have to detect elements with "–" like "37–41", but it turns out this is not a regular dash, it's a non-ASCII character:

elements = [u'2', u'3', u'13', u'37\u201341', u'43', u'44', u'46']

So I need something like

for e in elements:
  if "–" in e:
      # do something about it

If use that non-ASCII char in this if expression, then I get an error: "SyntaxError: Non-ASCII character '\xe2' in file...".

I tried to replace the if expression with this re method:

re.search('\xe2', e)

but it's not the case again. So I'm looking for a way to either convert that non-ASCII char to a regular ASCII "-" or use the ASCII number directly in the search expression.


Solution

  • # -*- coding: utf-8 -*-
    
    import re
    
    elements = [u'2', u'3', u'13', u'37\u201341', u'43', u'44', u'46']
    
    for e in elements:
        if (re.sub('[ -~]', '', e)) != "":
            #do something here
            print "-"
    

    re.sub('[ -~]', '', e) will strip out any valid ASCII characters in e (Specifically, replace any valid ASCII characters with ""), only non-ASCII characters of e are remained.

    Hope this help