For example I have two tables:
w_results.xlsx
:
Item Type Score
0 mashpotatoes food 30
1 daniel person 99
2 paper thing 24
3 john person 195
4 lamp thing 5
5 jessica person 92
6 pizza food 42
7 meat food 2
8 jacob person 83
w_runnable.xlsx
:
Item Type Score
0 mashpotatoes food 30
1 john person 195
2 raspberry food 50
3 iphone thing 75
4 boat thing 5
5 jake person 25
6 pizza food 42
7 meat food 2
8 daniel person 99
How can I get a table of all values that appear on w_results 'Item' column but not in w_runnable "Item" column?
I know that using pd.merge
is probably the way to go, but it just outputs the matches. My attempt below:
import pandas as pd
w_results = 'w_results.xlsx'
w_runnable = 'w_runnable.xlsx'
df_results_mylist = pd.read_excel(w_results, sheet_name='my_list')
df_runnable_mylist = pd.read_excel(w_runnable, sheet_name='my_list')
left_join = pd.merge(df_results_mylist,
df_runnable_mylist,
on = 'Item',
how = 'left')
print(left_join)
Output:
Item Type_x Score Type_y Score
0 mashpotatoes food 30 food 30.0
1 daniel person 99 person 99.0
2 paper thing 24 NaN NaN
3 john person 195 person 195.0
4 lamp thing 5 NaN NaN
5 jessica person 92 NaN NaN
6 pizza food 42 food 42.0
7 meat food 2 food 2.0
8 jacob person 83 NaN NaN
The values that do not match appear as NaN
(paper, lamp, jessica, jacob).
Don't know if there's a cleaner way to do this, so that these values appears in a separate table. Any help would be appreciated.
Desired output:
Code
make Item column and Type column to list in Series
cols = ['Item', 'Type']
df_results_mylist[cols].agg(list, axis=1)
result:
0 [mashpotatoes, food]
1 [daniel, person]
2 [paper, thing]
3 [john, person]
4 [lamp, thing]
5 [jessica, person]
6 [pizza, food]
7 [meat, food]
8 [jacob, person]
dtype: object
create condition using isin
function in similar way
cond = df_results_mylist[cols].agg(list, axis=1).isin(df_runnable_mylist[cols].agg(list, axis=1))
cond :
0 True
1 True
2 False
3 True
4 False
5 False
6 True
7 True
8 False
dtype: bool
If it is necessary to check equality only in Item
column, create condition with code below instead of code above.
cond = df_results_mylist['Item'].isin(df_runnable_mylist['Item'].unique())
no matter how cond was made, lets do boolean indexing by ~cond
df_results_mylist[~cond]
output:
Item Type Score
2 paper thing 24
4 lamp thing 5
5 jessica person 92
8 jacob person 83