I have a long format dataset: ID, time varying variable, time and outcome (y).
Subjects have differing numbers of rows due to different times and different outcome values, 0,1 or 2. But I need to only keep the outcome value corresponding to the last time point, and replace all other outcome rows to 0.
I can't figure out how to gen
a new variable = outcome only for max(time) by ID
id sbp y time
1 120 1 0
1 126 1 1
1 126 1 2
1 126 1 3
1 126 1 4
1 132 1 5
1 132 1 6
1 132 1 7
1 150 1 8
1 150 1 9
1 150 1 10
1 160 1 11
1 160 1 12
1 160 1 13
1 160 1 14
You seem to be asking quite different things:
Replacing outcome values before the last for each panel with 0.
Keeping only the last.
Here they are in turn:
bysort id (time) : replace y = 0 if _n < _N
by id: keep if _n == _N
If you just want the second, you need bysort id (time)
rather than by id
.