How to assign Pandas.Series.str.extractall() result back to original dataset? (TypeError: incompatible index of inserted column with frame index)

Dataset brief overview

dete_resignations['cease_date'].head()

gives

result

dete_resignations['cease_date'].value_counts()

gives

result of the code above

What I tried

I was trying to extract only the year value (e.g. 05/2012 -> 2012) from 'dete_resignations['cease_date']' using 'Pandas.Series.str.extractall()' and assign the result back to the original dataframe. However, since not all the rows contain that specific string values(e.g. 05/2012), an error occurred.

Here are the code I wrote.

pattern = r"(?P<month>[0-1][0-9])/?(?P<year>[0-2][0-9]{3})"
years = dete_resignations['cease_date'].str.extractall(pattern)
dete_resignations['cease_date_'] = years['year']

'TypeError: incompatible index of inserted column with frame index'

I thought the 'years' share the same index with 'dete_resignations['cease']'. Therefore, even though two dataset's index is not identical, I expected python automatically matches and assigns the values to the right rows. But it didn't

Can anyone help solve this issue?

Much appreciated if someone can enlighten me!

Solution

If you only want the years, then don't catch the month in pattern, and you can use extract instead of extractall:

# the $ indicates end of string
# \d is equivalent to [0-9]
# pattern extracts the last digit groups
pattern = '(?P<year>\d+)$'
years = dete_resignations['cease_date'].str.extract(pattern)
dete_resignations['cease_date_'] = years['year']