Search code examples
pandasmulti-index

How to sum a range of values in one column based on a range as defined by a multiindex


I'm well and truly stumped on this

I have a MultiIndex dataframe that looks like this

               data
index1 index2  
0      1       8
       2       7
       3       6
       4       9
1      1       3
       2       4
       3       3
       4       6
2      1       5
       2       5

.... and so on

and I'm trying to sum a load of values from the data column for each index1 based on a range of values from index2 to create a new dataframe.

i.e. if I were to create a new dataframe from the data values that correspond to the first 2 values of index2 per index1 from the example above I would want to get,

index1 summed_data
0      15
1      7
2      10

Does anyone know how to do this?


Solution

  • You don't need to change your input format, using the following statement:

    x = df.groupby(level ='index1').agg({'data': lambda x: x[:2].sum()}).rename(columns = {'data':'summed_data'})
    

    Then print:

            summed_data
    index1             
    0                15
    1                 7
    2                10