Search code examples
pythonpandasdata-sciencedata-processing

How extract Edition type ,Month and Year from this book price data set data in simple way?


import pandas as pd

df=pd.DataFrame({'Edition_TypeDate': 
                [''2016'','5 Oct 2017','2017','2 Aug 2009','Illustrated, Import','Import, 22 Feb 2018','Import, 14 Dec 2017','Import, 1 Mar 2018','Abridged, Audiobook, Box set',
'International Edition, 26 Apr 2012','Import, 2018','Box set, 15 Jun 2014','Unabridged, 6 Jul 2007']})

enter image description here

I have one of the columns in my book dataset. Now From this column, I want three New columns.

1.Edition_Type -->that includes Import, Illustrated or null if nothing is mentioned

2.Edition_Month--->that includes Aug, Oct or null if nothing is mentioned

3.Edition _Year--->that includes 2016,2017,2018 or null if nothing is mentioned

How to do it? Help me to def a function that can I apply to this.


Solution

  • You can use Series.str.extract with keywords with | for regex or, for years (\d{4}$) means get 4digits numbers from end of string:

    df['Edition_Type'] = df['Edition_TypeDate'].str.extract(r'(Import|Illustrated)')
    df['Edition_Month'] = df['Edition_TypeDate'].str.extract(r'(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)')
    df['Edition _Year'] = df['Edition_TypeDate'].str.extract(r'(\d{4}$)')
    print (df)
                          Edition_TypeDate Edition_Type Edition_Month  \
    0                                 2016          NaN           NaN   
    1                           5 Oct 2017          NaN           Oct   
    2                                 2017          NaN           NaN   
    3                           2 Aug 2009          NaN           Aug   
    4                  Illustrated, Import  Illustrated           NaN   
    5                  Import, 22 Feb 2018       Import           Feb   
    6                  Import, 14 Dec 2017       Import           Dec   
    7                   Import, 1 Mar 2018       Import           Mar   
    8         Abridged, Audiobook, Box set          NaN           NaN   
    9   International Edition, 26 Apr 2012          NaN           Apr   
    10                        Import, 2018       Import           NaN   
    11                Box set, 15 Jun 2014          NaN           Jun   
    12              Unabridged, 6 Jul 2007          NaN           Jul   
    
       Edition _Year  
    0           2016  
    1           2017  
    2           2017  
    3           2009  
    4            NaN  
    5           2018  
    6           2017  
    7           2018  
    8            NaN  
    9           2012  
    10          2018  
    11          2014