Search code examples
pythonpandasdataframesortinggroup-by

Group elements in dataframe and show them in chronological order


Consider the following dataframe, where Date is in the format DD-MM-YYY:

Date        Time      Table
01-10-2000  13:00:03  B
01-10-2000  13:00:04  A
01-10-2000  13:00:05  B
01-10-2000  13:00:06  A
01-10-2000  13:00:07  B
01-10-2000  13:00:08  A

How can I 1) group the observations by Table, 2) sort the rows according to Date and Time within each group, 3) show the groups in chronological order according to Date and Time of their first observation?

Date        Time      Table
01-10-2000  13:00:03  B
01-10-2000  13:00:05  B
01-10-2000  13:00:07  B
01-10-2000  13:00:04  A
01-10-2000  13:00:06  A
01-10-2000  13:00:08  A

Input data:

data = {
    'Date': ['01-10-2000', '01-10-2000', '01-10-2000', '01-10-2000', '01-10-2000', '01-10-2000'],
    'Time': ['13:00:03', '13:00:04', '13:00:05', '13:00:06', '13:00:07', '13:00:08'],
    'Table': ['B', 'A', 'B', 'A', 'B', 'A']
}
df = pd.DataFrame(data)

Solution

  • Use groupby.transform and numpy.lexsort:

    date = pd.to_datetime(df['Date']+' '+df['Time'])
    
    out = df.iloc[np.lexsort([
        date,
        df['Table'],
        date.groupby(df['Table']).transform('min')
    ])]
    

    Alternatively, using an intermediate column:

    date = pd.to_datetime(df['Date']+' '+df['Time'])
    
    out = (df.assign(date=date, min_date=date.groupby(df['Table']).transform('min'))
             .sort_values(by=['min_date', 'Table', 'date'])
             .drop(columns=['date', 'min_date'])
         )
    

    Output:

             Date      Time Table
    0  01-10-2000  13:00:03     B
    2  01-10-2000  13:00:05     B
    4  01-10-2000  13:00:07     B
    1  01-10-2000  13:00:04     A
    3  01-10-2000  13:00:06     A
    5  01-10-2000  13:00:08     A