Search code examples
pythonpandasdataframeindexingmulti-index

Indexing with multiindex dataframe in pandas


Consider the following example data:

data = {"Taxon": ["Firmicutes"]*5,
        "Patient": range(5),
        "Tissue": np.random.randint(0, 1000, size=5),
        "Stool": np.random.randint(0, 1000, size=5)}

df = pd.DataFrame(data).set_index(["Taxon", "Patient"])
print(df)

                    Stool  Tissue
Taxon      Patient               
Firmicutes 0          740     389
           1          786     815
           2          178     265
           3          841     484
           4          211     534

So, How can I query the dataframe only with the second level index Patient only? For example, I'd like to know all the data with respect to Patient 2.

I've tried data[data.index.get_level_values(1)==2], and it worked fine. But is there any way to achieve the same with one these (loc,iloc or ix) indexing methods?


Solution

  • I think the simpliest is use xs:

    np.random.seed(100)
    names = ['Taxon','Patient']
    mux = pd.MultiIndex.from_product([['Firmicutes', 'another'], range(1, 6)], names=names)
    df = pd.DataFrame(np.random.randint(10, size=(10,2)), columns=['Tissue','Stool'], index=mux)
    print (df)
                        Tissue  Stool
    Taxon      Patient               
    Firmicutes 1             8      8
               2             3      7
               3             7      0
               4             4      2
               5             5      2
    another    1             2      2
               2             1      0
               3             8      4
               4             0      9
               5             6      2
    

    print (df.xs(2, level=1))
                Tissue  Stool
    Taxon                    
    Firmicutes       3      7
    another          1      0
    
    #if need also level Patient
    print (df.xs(2, level=1, drop_level=False))
                        Tissue  Stool
    Taxon      Patient               
    Firmicutes 2             3      7
    another    2             1      0
    

    Solution with loc - is possible specify axis:

    print (df.loc(axis=0)[:,2])
                        Tissue  Stool
    Taxon      Patient               
    Firmicutes 2             3      7
    another    2             1      0