Search code examples
panelstataranking

Ranking aggregated values in panel data


I have an unbalanced panel data set with daily data similar to this, for n countries:

quarter date id trade trade_quarterly rank       i
 1       1    1    1    2               1        10
 1       2    1    1    2               1        17
 1       1    2    1    1               2        12

 2       1    1    0    1               1        5
 2       2    1    1    1               1        9
 2       1    2    0    1               1        14
 2       2    2    1    1               1        8
 2       2    3    0    0               3        6

Given are the first 4 columns.

Interested in information i, I would now like to keep only the 2 most traded ids for each quarter. I aggregated quarterly trades with

bysort quarter id: egen trade_quarterly =sum(trade) 

to get column 5.

To calculate column 6, I tried using

bysort quarter id : egen xx =rank(trade_quarterly), "option" 

which does not appear to produce the correct solution.

(Note that since the values are aggregated within ids ranking with rank(xx), field would produce a wrong rank for the following id)


Solution

  • The last line of syntax

    bysort quarter id : egen xx =rank(trade_quarterly), option 
    

    is not legal, as the literal text option is itself not an option. More generally, egen, rank() can not help here with your present data structure.

    But consider this, just a matter of a collapse to sums (totals) and then keeping only the largest two (the last two after sorting) within cross-combinations:

    clear 
    input quarter date id trade 
     1       1    1    1    2   
     1       2    1    1    2   
     1       1    2    1    1   
     2       1    1    0    1   
     2       2    1    1    1   
     2       1    2    0    1   
     2       2    2    1    1   
     2       2    3    0    0   
    end 
    collapse (sum) trade, by(quarter id) 
    bysort quarter (trade) : keep if (_N - _n) < 2 
    list, sepby(id quarter) 
    
         +----------------------+
         | quarter   id   trade |
         |----------------------|
      1. |       1    2       1 |
         |----------------------|
      2. |       1    1       2 |
         |----------------------|
      3. |       2    1       1 |
         |----------------------|
      4. |       2    2       1 |
         +----------------------+
    

    If you don't want to collapse, then extra technique is to tag each id-quarter pair just once when ranking.

    clear 
    input quarter date id trade 
     1       1    1    1    2   
     1       2    1    1    2   
     1       1    2    1    1   
     2       1    1    0    1   
     2       2    1    1    1   
     2       1    2    0    1   
     2       2    2    1    1   
     2       2    3    0    0   
    end 
    egen sum = total(trade), by(quarter id) 
    egen tag = tag(quarter id) 
    bysort tag quarter (trade) : gen tokeep = tag & (_N - _n) < 2 
    bysort quarter id (tokeep) : replace tokeep = tokeep[_N] 
    list if tokeep, sepby(quarter) 
    
         +--------------------------------------------------+
         | quarter   date   id   trade   sum   tag   tokeep |
         |--------------------------------------------------|
      1. |       1      2    1       1     2     0        1 |
      2. |       1      1    1       1     2     1        1 |
      3. |       1      1    2       1     1     1        1 |
         |--------------------------------------------------|
      4. |       2      2    1       1     1     0        1 |
      5. |       2      1    1       0     1     1        1 |
      6. |       2      2    2       1     1     0        1 |
      7. |       2      1    2       0     1     1        1 |
         +--------------------------------------------------+
    

    Note, in agreement with @William Lisowski's comment, that the largest two may not be uniquely identifiable in the presence of ties.