Search code examples
pythonpandasdataframedata-analysis

Filter record from one data frame based on column values in second data frame in python


I have two DataFrame df1 and df2.

df1 is the original dataset and df2 is the dataset made from df1 after some manipulation.

In df1 I have column 'log' and in df2 I have column 'log1' and 'log2' two columns.

where the values in columns 'log1' and 'log2' contains in column 'log' in df1.

df2 sample below

date  id     log1    log2
1   uu1q    (2,4)   (3,5)
1   uu1q    (2,4)   (7,6)
1   uu1q    (3,5)   (7,6)
5   u25a    (4,7)   (3,9)
5   uu25a   (1,9)   (3,9)
6   ua3b7   (1,1)   (2,2)
6   ua3b7   (1,1)   (3,3)
6   ua3b7   (2,2)   (3,3)

df1 column sample with data below

date id     log    name   col1  col2
1   uu1q    (2,4)   xyz   1123  qqq
1   uu1q    (3,5)   aas   2132  wew
1   uu1q    (7,6)   wqas  2567  uuo
5   u25a    (4,7)   enj   666   ttt
5   fff     (0,0)   ddd   0     lll

Now I want to take fetch/filter all the records from df1 based on column values for each row in df2 i.e. based on 'date', 'id', 'log1' or 'log2' and compare it with columns in df1 i.e. 'date', 'id', 'log'.

NOTE: values columns 'log1' and 'log2' contained in single column 'log'


Solution

  • IIUC, you're looking for a chained isin:

    out = df1[df1['date'].isin(df2['date']) & df1['id'].isin(df2['id']) & (df1['log'].isin(df2['log1']) | df1['log'].isin(df2['log2']))]
    

    Output:

       date    id    log  name  col1 col2
    0     1  uu1q  (2,4)   xyz  1123  qqq
    1     1  uu1q  (3,5)   aas  2132  wew
    2     1  uu1q  (7,6)  wqas  2567  uuo
    3     5  u25a  (4,7)   enj   666  ttt