Search code examples
pythonpandasdata-scienceanalyticsdata-analysis

Need to select only those rows where at least one value is present from the list


Need to select only those rows where at least one value is present from the list.

I have tried to use isin function as below but it's returning those rows also which don't have any value from the list:-

df[(df.loc[:,'P_1':'P_90'].isin(list))]

Note:-P_1 to P_90 indicated range of columns. Sample data:- dib-cust_id|p_1|p_2|.......|p_90 345|1950|1860|..............|675

Note:- I can't provide exact data as it's confidential. Above is just an example how dataframe will look like with 91 columns.

Please let me know what is wrong in this method and is there any other way to get the desired result.


Solution

  • What you need is:

    df[df.loc[:,'P_1':'P_90'].isin(list())].dropna(how='all')
    

    Okay so this nans any non-matching values so you would actually need:

    df[df[df.loc[:,'P_1':'P_90'].isin(list())].dropna(how='all').index]
    

    Which is definitely not the most efficient.

    The reason why your current formula doesn't work is that you're basically returning an array of True/False with isin rather than a list. isin doesn't remove rows, it marks cells as True or False based on if the value is in the list given. So your array of True/False has the same index as the original df, and when you do df[array] it takes that array's index, aka the same index.

    Instead of giving True or False per cell, you want it per row - at this point pandas will filter the way you want.