Search code examples
pythonpandasdataframeindexingmulti-index

Sorting rows in multi-index dataframe while keeping secondary index in tact


I have the following multi-index dataframe set up:

                              created_at  ... compound
conv_nr elem_nr                            ...         
0       0       2020-03-30 18:41:32+00:00  ...   0.7184
        1       2020-03-30 18:31:47+00:00  ...  -0.0003
1       0       2020-03-30 18:35:15+00:00  ...  -0.3612
        1       2020-03-30 18:34:00+00:00  ...   0.1877
        2       2020-03-30 18:29:36+00:00  ...  -0.1027
...                                   ...  ...      ...
29071   1       2019-05-22 12:58:12+00:00  ...   0.0000
29072   0       2019-05-22 13:20:31+00:00  ...  -0.6619
        1       2019-05-22 12:58:12+00:00  ...   0.0000
29073   0       2019-05-22 13:20:05+00:00  ...   0.7506
        1       2019-05-22 12:58:12+00:00  ...   0.0000

I am struggling to order this so that the time stamps are in ascending order and that "elem_nr" is also reindexed accordingly (stays in tact). Consider when conv_nr = 0. The outcome should then be:

                               created_at  ... compound
conv_nr elem_nr                            ...         
0       0       2020-03-30 18:31:47+00:00  ...  -0.0003
        1       2020-03-30 18:41:32+00:00  ...   0.7184

So essentially I need to get ascending timestamps while ensuring the "elem_nr" does not get flipped/stays in place.


Solution

  • IIUC try using .values or .to_numpy() to prevent pandas from affecting the index:

    import numpy as np
    import pandas as pd
    
    df = pd.DataFrame({
        'conv_nr': [0, 0, 1, 1, 1],
        'elem_nr': [0, 1, 0, 1, 2, ],
        'created_at': ['2020-03-30 18:41:32+00:00',
                       '2020-03-30 18:31:47+00:00',
                       '2020-03-30 18:35:15+00:00',
                       '2020-03-30 18:34:00+00:00',
                       '2020-03-30 18:29:36+00:00'],
        'compound': [0.7184, -0.0003, -0.3612, 0.1877, -0.1027]
    })
    df['created_at'] = pd.to_datetime(df['created_at'])
    df = df.set_index(['conv_nr', 'elem_nr'])
    
    # Use df.column accessor to select all columns
    # Use .values to aligning by index
    df[df.columns] = df.sort_values(
        ['conv_nr', 'created_at'], ascending=True
    ).values
    print(df)
    

    df:

                                   created_at  compound
    conv_nr elem_nr                                    
    0       0       2020-03-30 18:31:47+00:00   -0.0003
            1       2020-03-30 18:41:32+00:00    0.7184
    1       0       2020-03-30 18:29:36+00:00   -0.1027
            1       2020-03-30 18:34:00+00:00    0.1877
            2       2020-03-30 18:35:15+00:00   -0.3612