Search code examples
pythonpandasdataframedictionaryapply

find value in column, which contains list and take another value from next column and put it in first table to new column


I have two tables

df1 = pd.DataFrame([{'a': 1}, {'a': 2}, {'a': 8}])
df1['b'] = ""

df2 = pd.DataFrame([{'e': [1,2,3], 'f': 1},{'e': [4,5,6], 'f': 2},{'e': [7,8,9], 'f': 3}])
       e  f

I would like to insert a value into df1['b'] from df2['f'] based on the condition that value in df['a'] is somewhere in df2['e']

Mostly I would use apply/map df1['a'].apply(lambda x: function)

I can get the specific value I need df2['f'].loc[df2['e'].apply(lambda y: value_to_look_for in y)].item()

but when I put it together df1['a'].apply(lambda x: (df2['f'].loc[df2['e'].map(lambda y:x in y)].item()))

I'm getting an error "ValueError: can only convert an array of size 1 to a Python scalar"

Can I ask for a solution, what am I missing?


Solution

  • I would explode, create a mapping Series, and map:

    df1['b'] = df1['a'].map(df2.explode('e').set_index('e')['f'])
    

    Output:

       a  b
    0  1  1
    1  2  1
    2  8  3
    

    Intermediate mapping Series:

    df2.explode('e').set_index('e')['f']
    
    e
    1    1
    2    1
    3    1
    4    2
    5    2
    6    2
    7    3
    8    3
    9    3
    Name: f, dtype: int64