Search code examples
pandasranking

Pandas Ranking based on multiple columns


I am trying to rank data based on several columns in ascending order.
Please see below data frame I am working on:

{'FACILITY': ['AAA', 'AAA', 'AAA', 'AAA', 'AAA'],
 'IN_DATE':
 ['2015-08-30 05:49:05',
  '2015-08-30 05:49:05',
  '2015-08-30 05:49:05',
  '2015-08-30 05:49:05',
  '2015-09-02 20:56:59'],
 'LOT':
 ['N123456', 'N654321', 'N654321', 'N123456', 'N123456'],
 'OPERATION':
 ['100', '100', '100', '100', '100'],
 'TXN_DATE':
 ['2015-08-30 06:04:03',
  '2015-08-30 05:59:57',
  '2015-08-30 06:37:32',
  '2015-08-30 06:30:01',
  '2015-09-02 21:39:44']

I am trying to create new column "ORDER" based on order within lot and operation in ascending order based on TXN_DATE.


Solution

  • You can get the sorted order using the rank method:

    In [11]: df
    Out[11]:
      FACILITY              IN_DATE      LOT OPERATION             TXN_DATE
    0      AAA  2015-08-30 05:49:05  N123456       100  2015-08-30 06:04:03
    1      AAA  2015-08-30 05:49:05  N123456       100  2015-08-30 05:59:57
    2      AAA  2015-08-30 05:49:05  N123456       100  2015-08-30 06:37:32
    3      AAA  2015-08-30 05:49:05  N123456       100  2015-08-30 06:30:01
    4      AAA  2015-09-02 20:56:59  N123456       100  2015-09-02 21:39:44
    
    In [12]: df["TXN_DATE"].rank()
    Out[12]:
    0    2
    1    1
    2    4
    3    3
    4    5
    Name: TXN_DATE, dtype: float64
    

    As a column:

    In [13]: df["ORDER"] = df["TXN_DATE"].rank()
    
    In [14]: df
    Out[14]:
      FACILITY              IN_DATE      LOT OPERATION             TXN_DATE  ORDER
    0      AAA  2015-08-30 05:49:05  N123456       100  2015-08-30 06:04:03      2
    1      AAA  2015-08-30 05:49:05  N123456       100  2015-08-30 05:59:57      1
    2      AAA  2015-08-30 05:49:05  N123456       100  2015-08-30 06:37:32      4
    3      AAA  2015-08-30 05:49:05  N123456       100  2015-08-30 06:30:01      3
    4      AAA  2015-09-02 20:56:59  N123456       100  2015-09-02 21:39:44      5
    

    Rank is also a Series groupby method:

    In [15]: df.groupby(["LOT", "OPERATION"])["TXN_DATE"].rank()
    Out[15]:
    0    2
    1    1
    2    4
    3    3
    4    5
    Name: (N123456, 100), dtype: float64
    

    Note: In this small example the name is from the only group, usually this would not have a name.