import pandas as pd
df=pd.DataFrame({'Edition_TypeDate':
[''2016'','5 Oct 2017','2017','2 Aug 2009','Illustrated, Import','Import, 22 Feb 2018','Import, 14 Dec 2017','Import, 1 Mar 2018','Abridged, Audiobook, Box set',
'International Edition, 26 Apr 2012','Import, 2018','Box set, 15 Jun 2014','Unabridged, 6 Jul 2007']})
I have one of the columns in my book dataset. Now From this column, I want three New columns.
1.Edition_Type -->that includes Import, Illustrated or null if nothing is mentioned
2.Edition_Month--->that includes Aug, Oct or null if nothing is mentioned
3.Edition _Year--->that includes 2016,2017,2018 or null if nothing is mentioned
How to do it? Help me to def a function that can I apply to this.
You can use Series.str.extract
with keywords with |
for regex or
, for years (\d{4}$)
means get 4digits numbers from end of string:
df['Edition_Type'] = df['Edition_TypeDate'].str.extract(r'(Import|Illustrated)')
df['Edition_Month'] = df['Edition_TypeDate'].str.extract(r'(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)')
df['Edition _Year'] = df['Edition_TypeDate'].str.extract(r'(\d{4}$)')
print (df)
Edition_TypeDate Edition_Type Edition_Month \
0 2016 NaN NaN
1 5 Oct 2017 NaN Oct
2 2017 NaN NaN
3 2 Aug 2009 NaN Aug
4 Illustrated, Import Illustrated NaN
5 Import, 22 Feb 2018 Import Feb
6 Import, 14 Dec 2017 Import Dec
7 Import, 1 Mar 2018 Import Mar
8 Abridged, Audiobook, Box set NaN NaN
9 International Edition, 26 Apr 2012 NaN Apr
10 Import, 2018 Import NaN
11 Box set, 15 Jun 2014 NaN Jun
12 Unabridged, 6 Jul 2007 NaN Jul
Edition _Year
0 2016
1 2017
2 2017
3 2009
4 NaN
5 2018
6 2017
7 2018
8 NaN
9 2012
10 2018
11 2014