Search code examples
pythonpandas

Pandas: How to custom-sort on multiple columns?


I have a pandas dataframe with data like:

+-----------+-----------------+---------+
| JOB-NAME  |  Status         | SLA     |
+-----------+-----------------+---------+
| job_1     | YET_TO_START    | ---     |
| job_3     | COMPLETED       | MET     |
| job_4     | RUNNING         | MET     |
| job_2     | YET_TO_START    | LATE    |
| job_6     | RUNNING         | LATE    |
| job_5     | FAILED          | LATE    |
| job_7     | YET_TO_START    | ---     |
| job_8     | COMPLETED       | NOT_MET |
+-----------+-----------------+---------+

I need to sort this table based on the Status and SLA states, like for Status: FAILED will be top on the table, then YET_TO_START, then RUNNING, and finally COMPLETED. Similarly for SLA the order will be LATE, ---, NOT_MET, and MET. Like this:

+-----------+-----------------+---------+
| JOB-NAME  |  Status         | SLA     |
+-----------+-----------------+---------+
| job_5     | FAILED          | LATE    |
| job_2     | YET_TO_START    | LATE    |
| job_1     | YET_TO_START    | ---     |
| job_7     | YET_TO_START    | ---     |
| job_6     | RUNNING         | LATE    |
| job_4     | RUNNING         | MET     |
| job_8     | COMPLETED       | NOT_MET |
| job_3     | COMPLETED       | MET     |
+-----------+-----------------+---------+

I am able to do this custom sorting priority-based only on single column Status, but unable to do for multiple columns.

sort_order_dict = {"FAILED":0, "YET_TO_START":1, "RUNNING":2, "COMPLETED":3}
joined_df = joined_df.sort_values(by=['status'], key=lambda x: x.map(sort_order_dict))

A solution is given here, but its for single column, not multiple column.


Solution

  • You can extend dictionary by values from another columns, only necessary different keys in both columns for correct working like mentioned mozway in comments:

    sort_order_dict = {"FAILED":0, "YET_TO_START":1, "RUNNING":2, "COMPLETED":3,
                       "LATE":4, "---":5, "NOT_MET":6, "MET":7}
    df = df.sort_values(by=['Status','SLA'], key=lambda x: x.map(sort_order_dict))
    print (df)
      JOB-NAME        Status      SLA
    5    job_5        FAILED     LATE
    3    job_2  YET_TO_START     LATE
    0    job_1  YET_TO_START      ---
    6    job_7  YET_TO_START      ---
    4    job_6       RUNNING     LATE
    2    job_4       RUNNING      MET
    7    job_8     COMPLETED  NOT_MET
    1    job_3     COMPLETED      MET
    

    Or use ordered Categorical:

    df['Status'] = pd.Categorical(df['Status'], ordered=True, 
                                  categories=['FAILED', 'YET_TO_START', 'RUNNING', 'COMPLETED'])
                                              
    df['SLA'] = pd.Categorical(df['SLA'], ordered=True, 
                                  categories= ['LATE', '---', 'NOT_MET', 'MET'])
    df = df.sort_values(by=['Status','SLA'])
    print (df)
      JOB-NAME        Status      SLA
    5    job_5        FAILED     LATE
    3    job_2  YET_TO_START     LATE
    0    job_1  YET_TO_START      ---
    6    job_7  YET_TO_START      ---
    4    job_6       RUNNING     LATE
    2    job_4       RUNNING      MET
    7    job_8     COMPLETED  NOT_MET
    1    job_3     COMPLETED      MET