Search code examples
python-3.xpandasdataframelist-comprehension

How to get dynamically subset of data frame by list comprehension?


My intend

I like to filter a data frame dynamically by a dict that is holding a list of dicts with column and value information to filter on.

Expected result should look like this:

result = {'brand': ['Honda','Honda','Toyota'],
              'model': ['a','b','d']
            }

My data frame

cars = {'brand': ['Honda','Honda','Honda','Toyota','Toyota','Audi'],
              'model': ['a','b','c','d','a','b'],
            }

My dict

Is holding information about relation between cars:

relation = {
    'relation1':[
        {'brand':'Honda','model':'a'},
        {'brand':'Honda','model':'b'},
        {'brand':'Toyota','model':'d'}
    ],
    'relation2':[
        {'brand':'...','model':'...'},
        {'brand':'...','model':'...'}
    ]
}

My try to setup list comprehension

' | '.join([f'{repr(k)}=={repr(v)}' for d in relation['relation1'] for k,v in d.items()])

Issues

  1. Implement filter will give an error:

    KeyError: "brand=='Honda' | model=='a' | brand=='Honda' | model=='b' | brand=='Toyota' | model=='d'"
    
  2. Adjusting filter to combine the dict values inside with & and outside with | so filter looks like:

    (brand=='Honda' & model=='a') | (brand=='Honda' & model=='b') | (brand=='Toyota' & model=='d')
    

Solution

  • Use nested list comprehension:

    q = ' | '.join([f"({' & '.join(f'{k}=={repr(v)}' for k,v in d.items())})" 
                    for d in relation['relation1']])
    print (q)
    (brand=='Honda' & model=='a') | 
    (brand=='Honda' & model=='b') | 
    (brand=='Toyota' & model=='b')
    
    
    df = df.query(q)
    print (df)
        brand model
    0   Honda     a
    1   Honda     b
    3  Toyota     d