Ranking within groups

I have data in Stata that looks like this -

State	Year	Revenue	Rank
A	2019	30	1
A	2019	30	1
A	2019	40	2
A	2020	45	1
A	2020	50	2
B	2019	35	1
B	2019	45	2
B	2020	22	1
B	2020	40	2

The rank column above is what I would like to achieve. Please note that there could be rows like the first and second one that are duplicates in State, Year and Revenue. I want the same rank to be given for these two rows. I basically want ranking within each state and year. I tried group() but it did not give the desired result.

Solution

You're at liberty to call this ranking, but it doesn't correspond to

what Stata supports with its egen, rank() function
what it supports with its egen, group() function
ranking in any strict statistical sense, whereby to a first approximation n observations are ranked 1 to n, or vice versa.

No matter, as what you want requires only one command line.

* Example generated by -dataex-. For more info, type help dataex
clear
input str1 state int year byte(revenue rank)
"A" 2019 30 1
"A" 2019 30 1
"A" 2019 40 2
"A" 2020 45 1
"A" 2020 50 2
"B" 2019 35 1
"B" 2019 45 2
"B" 2020 22 1
"B" 2020 40 2
end

bysort state year (revenue) : gen wanted = sum(revenue != revenue[_n-1])

list, sepby(state year)

     +----------------------------------------+
     | state   year   revenue   rank   wanted |
     |----------------------------------------|
  1. |     A   2019        30      1        1 |
  2. |     A   2019        30      1        1 |
  3. |     A   2019        40      2        2 |
     |----------------------------------------|
  4. |     A   2020        45      1        1 |
  5. |     A   2020        50      2        2 |
     |----------------------------------------|
  6. |     B   2019        35      1        1 |
  7. |     B   2019        45      2        2 |
     |----------------------------------------|
  8. |     B   2020        22      1        1 |
  9. |     B   2020        40      2        2 |
     +----------------------------------------+

That is, you bump up the result every time you see a different value. This works for the first observation in any group as the tacit reference to the value in observation 0 results in missing, which is different from the value in the first observation.