I have the following data:
data test ;
INPUT id st end ;
datalines ;
1 17 .
1 17 .
1 17 17.5
1 17 .
1 17 .
2 15 .
2 15 .
2 15 .
2 15 .
2 15 15.75
run ;
data test2 ;
set test ;
dur = end - st ;
run ;
I would like to fill in the missing values of "dur" by "id" to give:
id st end dur
1 17 . 17.5
1 17 . 17.5
1 17 17.5 17.5
1 17 . 17.5
2 15 . 15.75
2 15 . 15.75
2 15 . 15.75
2 15 . 15.75
2 15 15.75 15.75
As the rows are not in any order to use proc sort
and retain
, how can I replace the missing values of "dur"? (Without using a merge as the real dataset is extremely large).
You can use proc sort
and retain
as you say to achieve the right dur column, however it doesn't give you the same structure as your example output:
proc sort data = test;
by id descending end;
run;
data test2 ;
set test ;
by id;
retain dur;
if first.id then dur = end - st ;
run;
Then test2
looks like:
id st end dur
1 17 17.5 0.5
1 17 . 0.5
1 17 . 0.5
1 17 . 0.5
1 17 . 0.5
2 15 15.75 0.75
2 15 . 0.75
2 15 . 0.75
2 15 . 0.75
2 15 . 0.75
However, the proc sort
could be pretty slow if your data becomes too large. A proc SQL
method may be more useful:
proc sql;
create table test2 as
select id, st, end, max(end-st) as dur
from test
group by id;
quit;