Search code examples
pythonpandasmulti-index

Pandas Multiindex get values from first entry of index


I have the following multiindex dataframe:

from io import StringIO
import pandas as pd
datastring = StringIO("""File,no,runtime,value1,value2
    A,0, 0,12,34
    A,0, 1,13,34
    A,0, 2,23,34
    A,1, 6,23,38
    A,1, 7,22,38
    B,0,17,15,35
    B,0,18,17,35
    C,0,34,23,32
    C,0,35,21,32
    """)    
df = pd.read_csv(datastring, sep=',')
df.set_index(['File','no',df.index], inplace=True)


>> df
               runtime  value1  value2
File   no               
A      0    0     0       12      34
            1     1       13      34
            2     2       23      34
       1    3     6       23      38
            4     7       22      38
B      0    5     17      15      35
            6     18      17      35
C      0    7     34      23      32
            8     35      21      32

What I would like to get is just the first values of every entry with a new file and a different number

A 0 34
A 1 38
B 0 35
C 0 32

The most similar questions I could find where these

Resample pandas dataframe only knowing result measurement count

MultiIndex-based indexing in pandas

Select rows in pandas MultiIndex DataFrame

but I was unable to construct a solution from them. The best I got was the ix operation, but as the values technically are still there (just not on display), the result is

idx = pd.IndexSlice
df.loc[idx[:,0],:]

could, for example, filter for the 0 value but would still return the entire rest of the dataframe.

Is a multiindex even the right tool for the task at hand? How to solve this?


Solution

  • Use GroupBy.first by first and second level of MultiIndex:

    s = df.groupby(level=[0,1])['value2'].first()
    print (s)
    File  no
    A     0     34
          1     38
    B     0     35
    C     0     32
    Name: value2, dtype: int64
    

    If need one column DataFrame use one element list:

    df1 = df.groupby(level=[0,1])[['value2']].first()
    print (df1)
             value2
    File no        
    A    0       34
         1       38
    B    0       35
    C    0       32
    

    Another idea is remove 3rd level by DataFrame.reset_index and filter by Index.get_level_values with boolean indexing:

    df2 = df.reset_index(level=2, drop=True)
    s = df2.loc[~df2.index.duplicated(), 'value2']
    print (s)
    File  no
    A     0     34
          1     38
    B     0     35
    C     0     32
    Name: value2, dtype: int64