Consider the following.
import pandas as pd
d=pd.DataFrame([[1,'a'],[1,'b'],[2,'c'],[2,'a'],[3,'c'],[4,'a'],[4,'c']],columns=['A','B'])
I want those values in A which have EXACTLY 'c' ('c' and only 'c') associated with them. There is only one such value. It is 3. I wrote the following query, but it is not returning the right result.
d[ d.B.isin(['c']) & ~d.A.isin(d[d.B.isin(set(d.B.unique())-{'c'})].A.to_frame()) ].A.to_frame()
My idea is to find all values in A that have 'c' associated with them and then remove from them those values that also have something other than 'c' associated with them. But what the code returns is just the values that have 'c' associated with them. Can someone help me with this? Thank you.
The simplest idea is filter values with c
and not allow duplicates in A
column:
s1 = d.loc[d.B.eq('c') & ~d.A.duplicated(keep=False), 'A']
print (s1)
4 3
Name: A, dtype: int64
Your solution works by removing .to_frame()
, but better is to use loc
for select by mask - evaluation order matters:
s2 = d.loc[ d.B.isin(['c']) & ~d.A.isin(d.loc[d.B.isin(set(d.B.unique())-{'c'}), 'A']), 'A']
print (s2)
4 3
Name: A, dtype: int64
s2 = d[ d.B.isin(['c']) & ~d.A.isin(d[d.B.isin(set(d.B.unique())-{'c'})].A) ].A
print (s2)
4 3
Name: A, dtype: int64