Search code examples
statadata-management

variable showing the highest value attained of another variable, recorded so far, over time


I have a dataset of patients and their alcohol-related patient data over time (in years) like below

clear
input long patid float(year cohort)
1051 1994 1
2051 1972 1
2051 1989 2
2051 1990 2
2051 2000 2
2051 2001 3
2051 2002 1
2051 2003 2
8051 1995 1
8051 1996 1
8051 2003 1
end
label values cohort cohortlab
label define cohortlab 0 "general population" 1 "no alcohol data" 2 "indeterminate" 3 "non-drinker" 4 "low_risk" 5 "hazardous" 6 "AUD" , replace

I would like to create a variable that shows the highest level of alcohol code that has been used so far at any (year) point in a patient's record, such that the dataset would be like below:

clear
input long patid float(year cohort highestsofar)
1051 1994 1 1
2051 1972 1 1
2051 1989 2 2
2051 1990 2 2
2051 2000 2 2
2051 2001 3 3
2051 2002 1 3
2051 2003 2 3
8051 1995 1 1
8051 1996 1 1
8051 2003 1 1
end
label values cohort cohortlab
label values highestsofar cohortlab
label define cohortlab 0 "general population" 1 "no alcohol data" 2 "indeterminate" 3 "lifetime_abstainer" 4 "low_risk" 5 "hazardous" 6 "AUD" , replace

Solution

  • Thanks for the clear example and question.

    The problem is already covered by an FAQ link here on the StataCorp website. Here's a one-line solution using rangestat from SSC.

    clear
    input long patid float(year cohort)
    1051 1994 1
    2051 1972 1
    2051 1989 2
    2051 1990 2
    2051 2000 2
    2051 2001 3
    2051 2002 1
    2051 2003 2
    8051 1995 1
    8051 1996 1
    8051 2003 1
    end
    label values cohort cohortlab
    label define cohortlab 0 "general population" 1 "no alcohol data" 2 "indeterminate" 3 "non-drinker" 4 "low_risk" 5 "hazardous" 6 "AUD" , replace
    
    rangestat (max) highestsofar = cohort, interval(year . 0) by(patid) 
    
    list, sepby(patid) 
    
         +-------------------------------------------+
         | patid   year            cohort   highes~r |
         |-------------------------------------------|
      1. |  1051   1994   no alcohol data          1 |
         |-------------------------------------------|
      2. |  2051   1972   no alcohol data          1 |
      3. |  2051   1989     indeterminate          2 |
      4. |  2051   1990     indeterminate          2 |
      5. |  2051   2000     indeterminate          2 |
      6. |  2051   2001       non-drinker          3 |
      7. |  2051   2002   no alcohol data          3 |
      8. |  2051   2003     indeterminate          3 |
         |-------------------------------------------|
      9. |  8051   1995   no alcohol data          1 |
     10. |  8051   1996   no alcohol data          1 |
     11. |  8051   2003   no alcohol data          1 |
         +-------------------------------------------+