Search code examples
sas

Split a dataset by ranges of dates


suppose to have the following dataset:

       ID                 date            
        A                 201806
        B                 201809
        C                 201901
        D                 201905
        E                 201906
        F                 202002
        G                 202003
        H                 202004

Is there a way to split the dataset year by year from June to June of the next year? Let say:

The new dataset 1 will be:

       ID                 date            
        A                 201806
        B                 201809
        C                 201901
        D                 201905

from June (01) of 2018 to (31) May 2019

The new dataset 2 will be:

       ID                 date   
        E                 201906
        F                 202002
        G                 202003
        H                 202004

from June (01) of 2019 to (30) April 2020

Thank you in advance


Solution

  • From the posted results, I assume that 202006 should be 202004 in the last observations of your posted data.

    It is rarely a good idea to split data like this, but you can do it like this

    data have;
    input ID $ date : yymmn6.;
    format date yymmn6.;
    datalines;
    A 201806
    B 201809
    C 201901
    D 201905
    E 201906
    F 202002
    G 202003
    H 202004
    ;
    
    data temp;
       set have;
       from = intnx('year.6', date, 0, 'b');
       to   = intnx('year.6', date, 0, 'e');
    run;
    
    proc sort data = temp;
       by to;
    run;
    
    data _null_;
    
       if _N_ = 1 then do;                
          dcl hash h(dataset:"temp(obs=0)", multidata:'y', ordered : 'A'); 
          h.definekey('ID', 'date');                                                                        
          h.definedone();                                          
       end;
    
       do until(last.to);                                     
          set temp;                                             
          by to;
          h.add();                                                  
       end;
    
       h.output(dataset : cats('data_', _N_));                                   
       h.clear();     
     
    run;