Search code examples
stata

How to apply maximum value to a whole group using Stata


I want to generate a variable max_count wherein, for a given group ID, if the value of count for the current year is higher than for the previous year then max_count takes the value for the current year. The value for the current year will be applied to the succeeding years until a higher value than that in the current year occurs. For instance, in the example below for ID 2, the value of count in 2001 is 10 but the succeeding years (2002 and 2003) have values less than 10 (i.e. 2 and 4) so 2002 and 2003 then take the value of 10 (the highest value after 2001).

I used this Stata code but it doesn't work:

bysort id (Year): gen max_count=max(count, count[_n-1])

The highest value is only applied to the immediately succeeding year and not to all succeeding years.

ID  Year    count   max_count
1   2000    5          5
1   2001    0          5
1   2002    3          5
1   2003    7          7
2   2000    5          5
2   2001    10        10
2   2002    2         10
2   2003    4         10
3   2000    2         2
3   2001    5         5
3   2002    9         9
3   2003    6         9

Solution

  • clear 
    input ID    Year    count   max_count
        1   2000    5          5
        1   2001    0          5
        1   2002    3          5
        1   2003    7          7
        2   2000    5          5
        2   2001    10        10
        2   2002    2         10
        2   2003    4         10
        3   2000    2         2
        3   2001    5         5
        3   2002    9         9
        3   2003    6         9
    end 
    
    bysort ID (Year) : gen wanted = count[1]
    by ID : replace wanted = max(wanted[_n-1], count) if _n > 1 
    
    list, sepby(ID)
    
         +---------------------------------------+
         | ID   Year   count   max_co~t   wanted |
         |---------------------------------------|
      1. |  1   2000       5          5        5 |
      2. |  1   2001       0          5        5 |
      3. |  1   2002       3          5        5 |
      4. |  1   2003       7          7        7 |
         |---------------------------------------|
      5. |  2   2000       5          5        5 |
      6. |  2   2001      10         10       10 |
      7. |  2   2002       2         10       10 |
      8. |  2   2003       4         10       10 |
         |---------------------------------------|
      9. |  3   2000       2          2        2 |
     10. |  3   2001       5          5        5 |
     11. |  3   2002       9          9        9 |
     12. |  3   2003       6          9        9 |
         +---------------------------------------+
    
        
    

    There is a detailed discussion of how to get such records (the maximum or minimum so far is the "record", as in sport) in this Stata FAQ.

    For a one-line solution, install rangestat from SSC and then

    rangestat (max) WANTED = count, int(Year . 0) by(ID) 
    

    The problem of when the record occurred is naturally related:

    by ID : gen when = Year[1]
    by ID : replace when = cond(wanted > wanted[_n-1], Year, when[_n-1]) if _n > 1