I'm parsing multiple XML files with Python 2.7, there are some strings like: string ="[2,3,13,37–41,43,44,46]"
. I split them to get a list of all elements, and then I have to detect elements with "–" like "37–41", but it turns out this is not a regular dash, it's a non-ASCII character:
elements = [u'2', u'3', u'13', u'37\u201341', u'43', u'44', u'46']
So I need something like
for e in elements:
if "–" in e:
# do something about it
If use that non-ASCII char in this if expression, then I get an error: "SyntaxError: Non-ASCII character '\xe2' in file..."
.
I tried to replace the if
expression with this re method:
re.search('\xe2', e)
but it's not the case again. So I'm looking for a way to either convert that non-ASCII char to a regular ASCII "-" or use the ASCII number directly in the search expression.
# -*- coding: utf-8 -*-
import re
elements = [u'2', u'3', u'13', u'37\u201341', u'43', u'44', u'46']
for e in elements:
if (re.sub('[ -~]', '', e)) != "":
#do something here
print "-"
re.sub('[ -~]', '', e)
will strip out any valid ASCII characters in e
(Specifically, replace any valid ASCII characters with ""), only non-ASCII characters of e are remained.
Hope this help