dataset example:
experience
5 month
nan
1 months
8 month
17 months
8 year
11 years
1.7 year
3.1 years
15.7 months
18 year
2017.2 years
98.3 years
68 year
I have a column with applicant's experience in years. It is very messy and I tried to go through it and create a sample. I have numbers followed by (month or months) and (year of years).
There are many nan entries and it should be ignored.
The goal is to create a column experience in months:
if nan
copy nan to the corresponding column
if the row has month or months
copy the number to the corresponding column
if year or years in the row and the number <55
the number shall be multiplied by 12 and copied to the corresponding column
else copy nan to the corresponding column
How to achieve this?
my_dict = {'Experience': ['5 month', 'nan', '1 months', '8 month','17 months','8 year',
'11 years','1.7 year', '3.1 years', '15.7 months','18 year',
'2017.2 years', '98.3 years', '68 year']}
df = pd.DataFrame(my_dict)
# Create filter for month/months
month_filt = df['Experience'].str.contains('month')
# Filter DataFrame for rows that contain month/months
df['Months'] = df.loc[month_filt, 'Experience'].str.strip('month|months')
# Create filter for year/years
year_filt = df['Experience'].str.contains('year')
# Filter DataFrame for rows that contain year/years
df['Years'] = df.loc[year_filt, 'Experience'].str.strip('year|years')
# Fill NaN in Years column
df.loc[df['Years'].isna(),'Years'] = np.nan
# Convert Years to months
df.loc[df['Months'].isna(),'Months'] = df['Years'].astype('float') * 12
# Set years greater than 55 to NaN
df.loc[df['Years'].astype('float') > 55, 'Months'] = np.nan
Experience Months Years
0 5 month 5 NaN
1 nan NaN NaN
2 1 months 1 NaN
3 8 month 8 NaN
4 17 months 17 NaN
5 8 year 96 8
6 11 years 132 11
7 1.7 year 20.4 1.7
8 3.1 years 37.2 3.1
9 15.7 months 15.7 NaN
10 18 year 216 18
11 2017.2 yearsNaN 2017.2
12 98.3 years NaN 98.3
13 68 year NaN 68