python pandas numpy feature-extraction feature-engineering

Conversion of years and months to months with string column input python

dataset example:

experience

5 month
nan
1 months
8 month
17 months
8 year
11 years
1.7 year
3.1 years
15.7 months
18 year
2017.2 years
98.3 years
68 year

I have a column with applicant's experience in years. It is very messy and I tried to go through it and create a sample. I have numbers followed by (month or months) and (year of years).

There are many nan entries and it should be ignored.

The goal is to create a column experience in months:

if nan
  copy nan to the corresponding column
if the row has month or months 
  copy the number to the corresponding column
if year or years in the row and the number <55 
  the number shall be multiplied by 12 and copied to the corresponding column
else copy nan to the corresponding column

How to achieve this?

Solution

my_dict = {'Experience': ['5 month', 'nan', '1 months', '8 month','17 months','8 year',
                          '11 years','1.7 year', '3.1 years', '15.7 months','18 year',
                          '2017.2 years', '98.3 years', '68 year']}

df = pd.DataFrame(my_dict)

# Create filter for month/months
month_filt = df['Experience'].str.contains('month')

# Filter DataFrame for rows that contain month/months 
df['Months'] = df.loc[month_filt, 'Experience'].str.strip('month|months')

# Create filter for year/years
year_filt = df['Experience'].str.contains('year')

# Filter DataFrame for rows that contain year/years
df['Years'] = df.loc[year_filt, 'Experience'].str.strip('year|years')

# Fill NaN in Years column
df.loc[df['Years'].isna(),'Years'] = np.nan

# Convert Years to months
df.loc[df['Months'].isna(),'Months'] = df['Years'].astype('float') * 12

# Set years greater than 55 to NaN
df.loc[df['Years'].astype('float') > 55, 'Months'] = np.nan

    Experience  Months  Years
0   5 month     5       NaN
1   nan         NaN     NaN
2   1 months    1       NaN
3   8 month     8       NaN
4   17 months   17      NaN
5   8 year      96      8
6   11 years    132     11
7   1.7 year    20.4    1.7
8   3.1 years   37.2    3.1
9   15.7 months 15.7    NaN
10  18 year     216     18
11  2017.2 yearsNaN 2017.2
12  98.3 years  NaN     98.3
13  68 year     NaN     68