I have the following problem, on the one hand I have a dataframe with two columns, the name of the people and the technologies they handle:
import pandas as pd
import numpy as np
censo = pd.DataFrame({"Name":["Uriel","Ricardo","Rodrigo","Arion"], "Tec":[("Sas, Python"),("Python, Pyspark"),("Python, Tableau"),("Excel")]})
censo.head(10)
The output is something like this:
Then I have another dataframe that has two other columns: requests and technology that request requires:
tec = pd.DataFrame({"Request":["001","002","003","004"], "Tec":["Python","Sas","Tableau","Excel"]})
tec.head(5)
Whose output is the following:
What I want to achieve is to create a column called "Assignment" in which appears the name of the person from the first dataframe that makes the best fit with the technology requested in the request.
I tried converting the values of the technologies to dummies and then trying to make the match, but I'm not even close to what I need.
a = censo['Tec'].str.get_dummies(sep=',')
a = a.drop_duplicates()
a = a.loc[~(a==0).all(axis=1)]
censo_ = pd.concat([censo,a], axis = 1)
censo_.head()
tec['Assignment'] = censo['Name'].apply(lambda x: 'Python' in x)
tec.head()
Any ideas what I could do to fix this problem?
First of all, Thanks!
Check with str.get_dummies
out = censo.join(censo['Tec'].str.get_dummies(','))
Update
out = censo.assign(Tec=censo.Tec.str.split(',')).explode('Tec').\
merge(tec,how='left').groupby(['Request','Tec'])['Name'].agg(list)
Request Tec
001 Python [Ricardo, Rodrigo]
002 Sas [Uriel]
004 Excel [Arion]
Name: Name, dtype: object