Need to select only those rows where at least one value is present from the list.
I have tried to use isin
function as below but it's returning those rows also which don't have any value from the list:-
df[(df.loc[:,'P_1':'P_90'].isin(list))]
Note:-
P_1 to P_90
indicated range of columns. Sample data:- dib-cust_id|p_1|p_2|.......|p_90 345|1950|1860|..............|675Note:- I can't provide exact data as it's confidential. Above is just an example how dataframe will look like with 91 columns.
Please let me know what is wrong in this method and is there any other way to get the desired result.
What you need is:
df[df.loc[:,'P_1':'P_90'].isin(list())].dropna(how='all')
Okay so this nan
s any non-matching values so you would actually need:
df[df[df.loc[:,'P_1':'P_90'].isin(list())].dropna(how='all').index]
Which is definitely not the most efficient.
The reason why your current formula doesn't work is that you're basically returning an array of True/False with isin
rather than a list. isin
doesn't remove rows, it marks cells as True
or False
based on if the value is in the list given. So your array of True
/False
has the same index as the original df, and when you do df[array]
it takes that array's index, aka the same index.
Instead of giving True
or False
per cell, you want it per row - at this point pandas will filter the way you want.