Search code examples
pandasdataframemulti-index

Multiple random selection from MultiIndex


Consider the following DataFrame:

import pandas as pd
arrays = [['A','A','B','B','C','C'],[1,1,3,3,5,5,],[2,2,4,4,6,6],[0.1,0.2,0.3,0.4,0.5,0.6]]
index = pd.MultiIndex.from_arrays(arrays,names=('Sample','P1','P2','T'))
data = np.random.rand(10,6)
df = pd.DataFrame(columns=index,data=data)

I want to select: for sample A, the column with T=0.2, and for sample C, the column with T=0.5.

I can easily select each of the single columns, e.g.:

df.loc[:,('A',slice(None),slice(None),0.2)]  # or
df.loc(axis=1)[('C',slice(None),slice(None),0.5)] 

But how can I combine them? I tried supplying a list of tuples:

df.loc[:,[('A',slice(None),slice(None),0.2),('C',slice(None),slice(None),0.5)]]

But that generates an error.

How can I select my columns without resorting to pd.concat?


Solution

  • use boolean indexing

    out = df.loc[:, df.columns.droplevel([1, 2]).isin([('A', 0.2), ('C', 0.5)])]
    

    out:

    Sample         A         C
    P1             1         5
    P2             2         6
    T            0.2       0.5
    0       0.836079  0.368242
    1       0.870087  0.520477
    2       0.582020  0.105908
    3       0.736918  0.324141
    4       0.386489  0.613063
    5       0.969809  0.358152
    6       0.325047  0.958949
    7       0.995300  0.474698
    8       0.674752  0.949571
    9       0.622846  0.878193