Search code examples
pandasexcelvlookup

Pandas VLOOKUP values that do not match


For example I have two tables:

w_results.xlsx:

enter image description here

           Item    Type  Score
0  mashpotatoes    food     30
1        daniel  person     99
2         paper   thing     24
3          john  person    195
4          lamp   thing      5
5       jessica  person     92
6         pizza    food     42
7          meat    food      2
8         jacob  person     83

w_runnable.xlsx:

enter image description here

           Item    Type  Score    
0  mashpotatoes    food         30
1          john  person        195
2     raspberry    food         50
3        iphone   thing         75
4         boat    thing          5
5          jake  person         25
6         pizza    food         42
7          meat    food          2
8        daniel  person         99

How can I get a table of all values that appear on w_results 'Item' column but not in w_runnable "Item" column? I know that using pd.merge is probably the way to go, but it just outputs the matches. My attempt below:

import pandas as pd
w_results = 'w_results.xlsx'
w_runnable = 'w_runnable.xlsx'

df_results_mylist = pd.read_excel(w_results, sheet_name='my_list')
df_runnable_mylist = pd.read_excel(w_runnable, sheet_name='my_list')

left_join = pd.merge(df_results_mylist,
                     df_runnable_mylist,
                     on = 'Item',
                     how = 'left')

print(left_join)

Output:

      Item  Type_x  Score  Type_y  Score    
0  mashpotatoes    food     30    food       30.0
1        daniel  person     99  person       99.0
2         paper   thing     24     NaN        NaN
3          john  person    195  person      195.0
4          lamp   thing      5     NaN        NaN
5       jessica  person     92     NaN        NaN
6         pizza    food     42    food       42.0
7          meat    food      2    food        2.0
8         jacob  person     83     NaN        NaN

The values that do not match appear as NaN (paper, lamp, jessica, jacob). Don't know if there's a cleaner way to do this, so that these values appears in a separate table. Any help would be appreciated.

Desired output:

enter image description here


Solution

  • Code

    make Item column and Type column to list in Series

    cols = ['Item', 'Type']
    df_results_mylist[cols].agg(list, axis=1)
    

    result:

    0    [mashpotatoes, food]
    1        [daniel, person]
    2          [paper, thing]
    3          [john, person]
    4           [lamp, thing]
    5       [jessica, person]
    6           [pizza, food]
    7            [meat, food]
    8         [jacob, person]
    dtype: object
    

    create condition using isin function in similar way

    cond = df_results_mylist[cols].agg(list, axis=1).isin(df_runnable_mylist[cols].agg(list, axis=1))
    

    cond :

    0     True
    1     True
    2    False
    3     True
    4    False
    5    False
    6     True
    7     True
    8    False
    dtype: bool
    

    If it is necessary to check equality only in Item column, create condition with code below instead of code above.

    cond = df_results_mylist['Item'].isin(df_runnable_mylist['Item'].unique())
    


    no matter how cond was made, lets do boolean indexing by ~cond

    df_results_mylist[~cond]
    

    output:

        Item    Type    Score
    2   paper   thing   24
    4   lamp    thing   5
    5   jessica person  92
    8   jacob   person  83