Consider the following example data:
data = {"Taxon": ["Firmicutes"]*5,
"Patient": range(5),
"Tissue": np.random.randint(0, 1000, size=5),
"Stool": np.random.randint(0, 1000, size=5)}
df = pd.DataFrame(data).set_index(["Taxon", "Patient"])
print(df)
Stool Tissue
Taxon Patient
Firmicutes 0 740 389
1 786 815
2 178 265
3 841 484
4 211 534
So, How can I query the dataframe only with the second level index Patient
only? For example, I'd like to know all the data with respect to Patient 2
.
I've tried data[data.index.get_level_values(1)==2]
, and it worked fine. But is there any way to achieve the same with one these (loc
,iloc
or ix
) indexing methods?
I think the simpliest is use xs
:
np.random.seed(100)
names = ['Taxon','Patient']
mux = pd.MultiIndex.from_product([['Firmicutes', 'another'], range(1, 6)], names=names)
df = pd.DataFrame(np.random.randint(10, size=(10,2)), columns=['Tissue','Stool'], index=mux)
print (df)
Tissue Stool
Taxon Patient
Firmicutes 1 8 8
2 3 7
3 7 0
4 4 2
5 5 2
another 1 2 2
2 1 0
3 8 4
4 0 9
5 6 2
print (df.xs(2, level=1))
Tissue Stool
Taxon
Firmicutes 3 7
another 1 0
#if need also level Patient
print (df.xs(2, level=1, drop_level=False))
Tissue Stool
Taxon Patient
Firmicutes 2 3 7
another 2 1 0
Solution with loc
- is possible specify axis
:
print (df.loc(axis=0)[:,2])
Tissue Stool
Taxon Patient
Firmicutes 2 3 7
another 2 1 0