Consider the following data frame.
import Pandas as pd
df = pd.Dataframe
df = pd.DataFrame()
df['Folder'] = [2,3,4,5 ,2,4,5, 2,3,4, 2,3,4,5,1]
df['Country'] = ['USA','USA','USA','USA' ,'Mexico','Mexico','Mexico', 'UK','UK','UK', 'Canada','Canada','Canada','Canada','Canada']
df['Data'] = [20,30,43,15 ,25,44,15, 26,37,47, 24,34,47,55,18]
df.set_index(['Country','Folder'], drop=True, inplace=True)
df
Data
Country Folder
USA 2 20
3 30
4 43
5 15
Mexico 2 25
4 44
5 15
UK 2 26
3 37
4 47
Canada 2 24
3 34
4 47
5 55
1 18
How do I collect the rows where Folder has all of lst=[1,3,4] in level Folder?
Data
Country Folder
Canada 2 24
3 34
4 47
5 55
1 18
OR
Data
Country Folder
Canada 3 34
4 47
1 18
Either would work for me. I want to know that Canada matches all of lst. lst may be up to 8 items long.
I have tried df.query("Folder in @lst") however that returns rows matching any of lst. I need matching All of lst.
Thanks in advance for any help.
This is an alternative to @jezrael's approach, where we group on the boolean values from isin
and country:
In [38]: (df.groupby([df.index.isin([1,3,4], level='Folder'),
df.index.get_level_values('Country')])
.filter(lambda x: len(x)==3)
)
Out[38]:
Data
Country Folder
Canada 3 34
4 47
1 18
Take advantage of the fact that you have three numbers in the list, so if it matches all, then it should be 3.
To get all values, you could chunk the steps:
mapping = df.index.isin([1,3,4], level = 'Folder')
filtered = (pd.Series(mapping)
.groupby(df.index.get_level_values('Country'))
.transform(lambda x: sum(x)>=3)
)
In [61]: df.loc[filtered.array]
Out[61]:
Data
Country Folder
Canada 2 24
3 34
4 47
5 55
1 18