Search code examples
sasdatastep

SAS - Working with consecutive months?


From the sample data below, I'm trying to identify accounts (by ID and SEQ) where there is an occurence of STATUS_DATE for at least 3 consecutive months. I've been messing with this for a while and I'm not at all sure how to tackle it.

Sample Data:

ID     SEQ   STATUS_DATE
11111    1   01/01/2014
11111    1   02/10/2014
11111    1   03/15/2014
11111    1   05/01/2014
11111    2   01/30/2014
22222    1   06/20/2014
22222    1   07/15/2014
22222    1   07/16/2014
22222    1   08/01/2014
22222    2   02/01/2014
22222    2   09/10/2014

What I need to return:

ID      SEQ   STATUS_DATE
11111    1    01/01/2014
11111    1    02/10/2014
11111    1    03/15/2014
22222    1    06/20/2014
22222    1    07/15/2014
22222    1    07/16/2014
22222    1    08/01/2014

Any help would be appreciated.


Solution

  • Here is one method:

    data have;
    input ID     SEQ   STATUS_DATE $12.;
    datalines;
    11111    1   01/01/2014
    11111    1   02/10/2014
    11111    1   03/15/2014
    11111    1   05/01/2014
    11111    2   01/30/2014
    22222    1   06/20/2014
    22222    1   07/15/2014
    22222    1   07/16/2014
    22222    1   08/01/2014
    22222    2   02/01/2014
    22222    2   09/10/2014
    ;
    run;
    
    data grouped (keep = id seq status_date group) groups (keep = group2);
        set have;
        sasdate = input(status_date, mmddyy12.);
        month = month(sasdate);
        year = year(sasdate);
        pdate = intnx('month', sasdate, -1);
        if lag(year) = year(sasdate) and lag(month) = month(sasdate) then group+0;
        else if lag(year) = year(pdate) and lag(month) = month(pdate) then count+1;
        else do;
            group+1;
            count = 0;
        end;
        if count = 0 and lag(count) > 1 then do;
            group2 = group-1;
            output groups;
        end;
        output grouped;
    run;
    
    data want (keep = id seq status_date);
        merge grouped groups (in=a rename=(group2=group));
        by group;
        if a;
    run;
    

    Basically I give observations the same group number if they are in consecutive months, then also create a data set with group numbers of groups with more than 2 observations. Then I merge those two data sets and only keep observations which are in the second data set, that is, those with more than 2 observations.