I am trying to write a regular expression that would match Unicode lowercase characters in Python 3. I'm using the re
library. For example, re.findall(some_pattern, 'u∏ñKθ')
should return ['u', 'ñ', 'θ']
.
In Sublime Text, I could simply type [[:lower:]]
to find these characters.
I'm aware that Python can match on any Unicode character with re.compile('[^\W\d_]')
, but I specifically need to differentiate between uppercase and lowercase. I'm also aware that re.compile('[a-z]')
would match any ASCII lowercase character, but my data is UTF-8, and it includes lots of non-ASCII characters—I checked.
Is this possible with regular expressions in Python 3, or will I need to take an alternative approach? I know other ways to do it. I was just hoping to use regex.
You can use the regex package if using a third party package is acceptable.
>>> import regex
>>> s = 'ABCabcÆæ'
>>> m = regex.findall(r'[[:lower:]]', s)
>>> m
['a', 'b', 'c', 'æ']