Search code examples
pythonpandasdataframegoogle-colaboratory

Assign elements of a dataframe from skills registered in another dataframe


I have the following problem, on the one hand I have a dataframe with two columns, the name of the people and the technologies they handle:

import pandas as pd
import numpy as np

censo = pd.DataFrame({"Name":["Uriel","Ricardo","Rodrigo","Arion"], "Tec":[("Sas, Python"),("Python, Pyspark"),("Python, Tableau"),("Excel")]})
censo.head(10)

The output is something like this:

enter image description here

Then I have another dataframe that has two other columns: requests and technology that request requires:

tec = pd.DataFrame({"Request":["001","002","003","004"], "Tec":["Python","Sas","Tableau","Excel"]})
tec.head(5)

Whose output is the following:

enter image description here

What I want to achieve is to create a column called "Assignment" in which appears the name of the person from the first dataframe that makes the best fit with the technology requested in the request.

I tried converting the values of the technologies to dummies and then trying to make the match, but I'm not even close to what I need.

 a = censo['Tec'].str.get_dummies(sep=',')
 a = a.drop_duplicates()
 a = a.loc[~(a==0).all(axis=1)]
 censo_ = pd.concat([censo,a], axis = 1)
censo_.head()

enter image description here

tec['Assignment'] = censo['Name'].apply(lambda x: 'Python' in x)
tec.head()

Any ideas what I could do to fix this problem?

First of all, Thanks!


Solution

  • Check with str.get_dummies

    out = censo.join(censo['Tec'].str.get_dummies(','))
    

    Update

    out = censo.assign(Tec=censo.Tec.str.split(',')).explode('Tec').\
                merge(tec,how='left').groupby(['Request','Tec'])['Name'].agg(list)
    Request  Tec   
    001      Python    [Ricardo, Rodrigo]
    002      Sas                  [Uriel]
    004      Excel                [Arion]
    Name: Name, dtype: object