Please advise the step by step that leads to the results which includes below question as well. Thanks!
df['text'].str.replace(r'(\w+day\b)', lambda x: x.groups()[0][:3])
Series.str
? I can't examine it.x
in x.groups()
and what does the groups()
do.[0]
in x.groups()[0][3]
?Given below dataframe, df
0 Monday: The doctor's appointment is at 2:45pm.
1 Tuesday: The dentist's appointment is at 11:30...
2 Wednesday: At 7:00pm, there is a basketball game!
3 Thursday: Be back home by 11:15 pm at the latest.
4 Friday: Take the train at 08:10 am, arrive at ...
This code transform above
to
0 Mon: The doctor's appointment is at 2:45pm.
1 Tue: The dentist's appointment is at 11:30 am.
2 Wed: At 7:00pm, there is a basketball game!
3 Thu: Be back home by 11:15 pm at the latest.
4 Fri: Take the train at 08:10 am, arrive at 09:...
Name: text, dtype: object
In complement to @AnuragDabas comment, here is a breakdown of the processing using python's re
module:
>>> import re
>>> s = "Monday: The doctor's appointment is at 2:45pm."
>>> re.search(r'(\w+day\b)', s) # find any word ending in "day"
<re.Match object; span=(0, 6), match='Monday'>
>>> re.search(r'(\w+day\b)', s).groups() # get the matching groups
('Monday',)
>>> re.search(r'(\w+day\b)', s).groups()[0] # take the first element
'Monday'
>>> re.search(r'(\w+day\b)', s).groups()[0][:3] # get the first 3 characters
'Mon'
When used in the context of pandas.Series.str.replace
, this passes the lambda
to the re.sub
function (as defined in the documentation) and uses the output as the replacement of the match (so "ABCDEFday" gets replaced with "ABC") .
description of the second parameter of .str.replace
:
repl: str or callable
Replacement string or a callable. The callable is passed the regex match object and must return a replacement string to be used. See re.sub().
NB. The regex is flawed in the way that any word ending in day
wil be processed. Thus if a line contained for example Saturday: this is my birthday and not a workday!
, this would give Sat: this is my bir and not a wor!