Proportions by Year and State using egen Command

I am trying to generate a new variable that is equal to the share of winners by state for each year in Stata.

I am using the egen command and I would like to know if this is the appropriate command for what I am looking for. My dataset is extremely large so it is hard for me to check manually. I have created year dummies for each year and the award_winner is a binary variable where 1 is equal to businesses that won the award and 0 if the business did not win the award that year.

sort state year_dummy*
by state year_dummy*: egen winner_bystate_year = mean(award_winner)

Solution

This is easy enough to test with a small fake dataset in which correct answers are clear. I don't know why you introduced dummy variables when you could work directly with year, but the answer's the same.

clear 
set obs 12 
gen state = cond(_n < 7, "A", "B")
egen year = seq(), from(2019) to(2020) block(3)
gen award_winner = real(word("0 0 0 0 0 1 0 1 1 1 1 1", _n)) 
gen order = _n 
tab year, gen(year)

bysort state year?: egen suggested = mean(award_winner)

bysort state year: egen better = mean(award_winner)

sort order 
list, sepby(state year)

     +-----------------------------------------------------------------------+
     | state   year   award_~r   order   year1   year2   sugges~d     better |
     |-----------------------------------------------------------------------|
  1. |     A   2019          0       1       1       0          0          0 |
  2. |     A   2019          0       2       1       0          0          0 |
  3. |     A   2019          0       3       1       0          0          0 |
     |-----------------------------------------------------------------------|
  4. |     A   2020          0       4       0       1   .3333333   .3333333 |
  5. |     A   2020          0       5       0       1   .3333333   .3333333 |
  6. |     A   2020          1       6       0       1   .3333333   .3333333 |
     |-----------------------------------------------------------------------|
  7. |     B   2019          0       7       1       0   .6666667   .6666667 |
  8. |     B   2019          1       8       1       0   .6666667   .6666667 |
  9. |     B   2019          1       9       1       0   .6666667   .6666667 |
     |-----------------------------------------------------------------------|
 10. |     B   2020          1      10       0       1          1          1 |
 11. |     B   2020          1      11       0       1          1          1 |
 12. |     B   2020          1      12       0       1          1          1 |
     +-----------------------------------------------------------------------+

The general principle is simple and important: to test code for statistical software, use a simple dataset for which there are known or obvious answers. Here "known" could be answers given by an existing implementation in the same or other software that is presumed correct.