Here are 2 examples on string operation methods from Python data science handbook, that I am having troubles understanding.
str.extract()
monte = pd.Series(['Graham Chapman', 'John Cleese', 'Terry Gilliam',
'Eric Idle', 'Terry Jones', 'Michael Palin'])
monte.str.extract('([A-Za-z]+)')
This operation returns the first name of each element in the Series. I don't get the expression input in the extract function.
str.findall()
monte.str.findall(r'^[^AEIOU].*[^aeiou]$')
This operation returns the original element if it starts and ends with consonants, returns an empty list otherwise. I figure that the ^
operator stands for negation of vowels. *
operator combines the situations of upper and lower cases of vowels.
Yet I do not understand the rest of the operators.
Please help me with understanding these input expressions. Thanks in advance.
The first ^
means in the beginning of the string, whereas $
means in the end of the string, here is an example:
>>> import re
>>> s = 'a123a'
>>> re.findall('^a', s)
['a']
>>>
This only prints one a
because I have the ^
sign which only finds in the begging of the string.
This is the same for $
, $
only finds stuff from the end of the string, here is an example:
>>> import re
>>> s = 'a123a'
>>> re.findall('a$', s)
['a']
>>>
Edited:
The meaning of r
is a raw string. Raw string it is what it looks like. For example, a backslash \
doesn't escape, it will just be a regular backslash.