I am working with time column like below:
import pandas as pd
# initialise data of lists.
data = {'Name':['Tom', 'nick', 'krish', 'jack'],
'Time':['3 years 4 months in role 20 years in company', '1 year 7 months in role 2 years 2 months in company',
'2 years 1 month in role 7 years 2 months in company', '3 months in role 8 years 2 months in company']}
# Create DataFrame
df = pd.DataFrame(data)
# Print the output.
df
I need to split the time column into two columns time_in_role(months) and time_in_company(months). And calculate the appropriate time in months.
Sample input:
Output expected:
Any help is appreciated.
You can use regular expressions to get years/months from the string. Afterwards it's just simple math:
role = (
df["Time"]
.str.extract(r"(?:(?P<y>\d+) years?)?\s*(?:(?P<m>\d+) months?)? in role")
.astype(float)
.fillna(0)
)
company = (
df["Time"]
.str.extract(r"(?:(?P<y>\d+) years?)?\s*(?:(?P<m>\d+) months?)? in company")
.astype(float)
.fillna(0)
)
df["Time_in_Role(months)"] = (role["y"] * 12 + role["m"]).astype(int)
df["Time_in_Company(months)"] = (company["y"] * 12 + company["m"]).astype(int)
print(df)
Prints:
Name Time Time_in_Role(months) Time_in_Company(months)
0 Tom 3 years 4 months in role 20 years in company 40 240
1 nick 1 year 7 months in role 2 years 2 months in company 19 26
2 krish 2 years 1 month in role 7 years 2 months in company 25 86
3 jack 3 months in role 8 years 2 months in company 3 98