Search code examples
datedatetimegroup-bysas

How to calculate difference between dates by group in SAS


I have data that resembles the below, and need to calculate the average time (in months) between dates in the Event_date column, grouped by Subject_ID. In other words, for each subject in the data, I need to know the average number of months between events relevant to that subject. Each subject has a different number of events that are associated with them. Is there a relatively straightforward way to do this? I know how to do this in R, but I'm relatively new to SAS and haven't yet been able to figure out a solution.

Thanks in advance for any help this community can provide!

Subject_ID Event_date
1 Date_1
1 Date_2
1 Date_3
2 Date_1
2 Date_2
3 Date_1
3 Date_2
3 Date_3
3 Date_4
3 Date_5
3 Date_6
4 Date_1
4 Date_2
5 Date_1
5 Date_2
5 Date_3
5 Date_4
6 Date_1
6 Date_2

Solution

  • In SAS, you can take advantage of by-group processing. We know we need to compare each date to the previous date, but we don't want to do it for the first date in the group. To do this task, we'll need to know two functions:

    1. lag(): Gets the previous row's value
    2. intck(): Counts time intervals between two dates (e.g. days, months, years, etc.)

    For every row except the first row, we want to run itnck('month', lag(date), date) to count the number of months between events. But the lag function is a bit special and doesn't behave as expected within a conditional statement, so we'll just run it on every row and set the first row in each group to missing instead.

    From there, you can run something like SQL or PROC MEANS to get the average value between dates per group.

    data have;
        format date date9.;
        input id date:date9.;
        datalines;
    1 01JAN2024
    1 08FEB2024
    1 12APR2024
    2 04JAN2024
    2 21APR2024
    3 06MAY2024
    3 09JUL2024
    3 12JUL2024
    3 18AUG2024
    ;
    run;
    
    data diffs;
        set have;
        by id;
    
        months_dif = intck('month', lag(date), date);
    
        if(first.id) then months_dif = .;
    run;
    
    /* Get the average per month using SQL */
    proc sql;
        create table want as
            select id, mean(months_dif) as avg_months_dif
            from diffs
            group by id
        ;
    quit;
    
    /* Alternatively you can use proc means */
    proc means data=diffs noprint;
        by id;
        output out=want(drop=_:)
            mean(months_dif) = avg_months_dif
        ;
    run;
    
    id  avg_months_dif
    1   1.5
    2   3
    3   1