Search code examples
pandasdataframemulti-index

how to perform an operation similar to group by on the first index of a multi indexed dataframe


The code to generate a sample dataframe is as follows

fruits=pd.DataFrame()
fruits['month']=['jan','feb','feb','march','jan','april','april','june','march','march','june','april']
fruits['fruit']=['apple','orange','pear','orange','apple','pear','cherry','pear','orange','cherry','apple','cherry']
fruits['price']=[30,20,40,25,30 ,45,60,45,25,55,37,60]

ind=(fruits.index)
fruits_grp = fruits.set_index(['month', ind],drop=False)

The output dataframe should look something like this:

fruits_new1=pd.DataFrame()
fruits_new1['month']=['jan','jan','feb','feb','march','march','march','apr','apr','apr','jun','jun']
fruits_new1['fruit']=['apple','apple','orange','pear','orange','orange','cherry','pear','cherry','cherry','pear','apple']
fruits_new1['price']=[30,30,20,40,25,25,55,45,60,60,45,37]
ind1=fruits_new1.index
fruits_grp1 = fruits_new1.set_index(['month', ind1],drop=False)
fruits_grp1

Thank you


Solution

  • use:

    d={'Jan': 0, 'Feb': 1, 'Mar': 2, 'Apr': 3, 'May': 4, 'Jun': 5, 'Jul': 6, 'Aug': 7, 'Sep': 8, 'Oct': 9, 'Nov': 10, 'Dec': 11}
    
    idx=fruits_grp['month'].str.title().str[:3].map(d).sort_values().index
    fruits_grp=fruits_grp.reindex(idx)
    fruits_grp['s']=list(range(len(fruits_grp)))
    fruits_grp=fruits_grp.set_index('s',append=True).droplevel(1).rename_axis(index=['month',None])
    

    Update:

    sample dataframe:

    fruits=pd.DataFrame()
    fruits['month']=[1,2,2,3,1,4,4,6,3,3,6,4]
    fruits['fruit']=['apple','orange','pear','orange','apple','pear','cherry','pear','orange','cherry','apple','cherry']
    fruits['price']=[30,20,40,25,30 ,45,60,45,25,55,37,60]
    
    ind=(fruits.index)
    fruits_grp = fruits.set_index(['month', ind],drop=False)
    

    Then just simply use:

    idx=fruits_grp['month'].sort_values().index
    fruits_grp=fruits_grp.reindex(idx)
    fruits_grp['s']=list(range(len(fruits_grp)))
    fruits_grp=fruits_grp.set_index('s',append=True).droplevel(1).rename_axis(index=['month',None])