Search code examples
pandasdataframemulti-indexshuffle

How to shuffle the outer index randomly and inner index in a different random order in a multi index dataframe


The following is some code to generate a sample dataframe:

fruits=pd.DataFrame()
fruits['month']=['jan','feb','feb','march','jan','april','april','june','march','march','june','april']
fruits['fruit']=['apple','orange','pear','orange','apple','pear','cherry','pear','orange','cherry','apple','cherry']
ind=fruits.index
ind_mnth=fruits['month'].values
fruits['price']=[30,20,40,25,30 ,45,60,45,25,55,37,60]
fruits_grp = fruits.set_index([ind_mnth, ind],drop=False)

How can I shuffle the outer index randomly and inner index in a different random order in this multi-index data frame?


Solution

  • Assuming this dataframe with MultiIndex as input:

              month   fruit  price
    jan   0     jan   apple     30
    feb   1     feb  orange     20
          2     feb    pear     40
    march 3   march  orange     25
    jan   4     jan   apple     30
    april 5   april    pear     45
          6   april  cherry     60
    june  7    june    pear     45
    march 8   march  orange     25
          9   march  cherry     55
    june  10   june   apple     37
    april 11  april  cherry     60
    

    First shuffle the whole DataFrame, then regroup the months by indexing on a random order:

    np.random.seed(0)
    idx0 = np.unique(fruits_grp.index.get_level_values(0))
    np.random.shuffle(idx0)
    fruits_grp.sample(frac=1).loc[idx0]
    

    output:

              month   fruit  price
    jan   0     jan   apple     30
          4     jan   apple     30
    april 6   april  cherry     60
          5   april    pear     45
          11  april  cherry     60
    feb   1     feb  orange     20
          2     feb    pear     40
    june  10   june   apple     37
          7    june    pear     45
    march 8   march  orange     25
          9   march  cherry     55
          3   march  orange     25