Search code examples
pythonpandasdataframe

Selecting data from Pandas dataframe based on criteria stored in a dict


I have a Pandas dataframe that contains a large number of variables. This can be simplified as:

tempDF = pd.DataFrame({ 'var1': [12,12,12,12,45,45,45,51,51,51],
                        'var2': ['a','a','b','b','b','b','b','c','c','d'],
                        'var3': ['e','f','f','f','f','g','g','g','g','g'],
                        'var4': [1,2,3,3,4,5,6,6,6,7]})

If I wanted to select a subset of the dataframe (e.g. var2='b' and var4=3), I would use:

tempDF.loc[(tempDF['var2']=='b') & (tempDF['var4']==3),:]

However, is it possible to select a subset of the dataframe if the matching criteria are stored within a dict, such as:

tempDict = {'var2': 'b','var4': 3}

It's important that the variable names are not predefined and the number of variables included in the dict is changeable.

I've been puzzling over this for a while and so any suggestions would be greatly appreciated.


Solution

  • You could create mask for each condition using list comprehension and then join them by converting to dataframe and using all:

    In [23]: pd.DataFrame([tempDF[key] == val for key, val in tempDict.items()]).T.all(axis=1)
    Out[23]:
    0    False
    1    False
    2     True
    3     True
    4    False
    5    False
    6    False
    7    False
    8    False
    9    False
    dtype: bool
    

    Then you could slice your dataframe with that mask:

    mask = pd.DataFrame([tempDF[key] == val for key, val in tempDict.items()]).T.all(axis=1)
    
    In [25]: tempDF[mask]
    Out[25]:
       var1 var2 var3  var4
    2    12    b    f     3
    3    12    b    f     3