Search code examples
loopsdatesasmissing-data

year-data missing for the date range in two columns in SAS with id


I have data in the following format

Startdate         perm_id     amount       enddate

01JUL2004   4294970232.    787560      01JUL2009
10AUG2005   4294970232     797880      01JUL2009
31JAN2007       4294970232     4344300     31JAN2012
20JUL2010       4294979802     4206400     06JUL2012
08JUL2011       4294979802     4663890     07JUL2012

`

I am trying to obtain the years for all missing years between start_date and end_date for each perm_id?

So it should look something like below

startdate       perm_id         tranche_amount        enddate
01JUL2004   4294970232     787560      01JUL2009
10AUG2005   4294970232     797880      01JUL2009
10AUG2006   4294970232     797880      01JUL2009
31JAN2008       4294970232     4344300     31JAN2012
31JAN2009       4294970232     4344300     31JAN2012
31JAN2010       4294970232     4344300     31JAN2012
31JAN2011       4294970232     4344300     31JAN2012

20JUL2010       4294979802     4206400     06JUL2012
08JUL2011       4294979802     4663890     07JUL2012

I have run the following code;

`your text`data loansdata;
`your text`select startdate, enddate and perm_id;
`your text`input startdate enddate perm_id date9.;
`your text`want=1;
`your text`do while (startdate <=year<= enddate);
    output;
    date=intnx('year', date, 1, 'b');
`end;
`format perm_id year tranche_amount  date9.;
`run;

Appreciate comments;

Regards`

Seb


Solution

  • Sounds like you just want to output one record per year from the input data and then aggregate by ID and YEAR.

    data have ;
      input startdate :date. perm_id :$12. amount enddate :date.;
      format startdate enddate date9.;
    cards;
    01JUL2004 4294970232  787560 01JUL2009
    10AUG2005 4294970232  797880 01JUL2009
    31JAN2007 4294970232 4344300 31JAN2012
    20JUL2010 4294979802 4206400 06JUL2012
    08JUL2011 4294979802 4663890 07JUL2012
    ;
    
    data step1;
      set have;
      do year=year(startdate) to year(enddate);
        output;
      end;
      keep perm_id amount year;
    run;
    
    proc summary data=step1 nway;
      class perm_id year;
      var amount;
      output out=want(drop=_type_) sum=;
    run;
    

    Result

    Obs     perm_id      year    _FREQ_     amount
    
      1    4294970232    2004       1       787560
      2    4294970232    2005       2      1585440
      3    4294970232    2006       2      1585440
      4    4294970232    2007       3      5929740
      5    4294970232    2008       3      5929740
      6    4294970232    2009       3      5929740
      7    4294970232    2010       1      4344300
      8    4294970232    2011       1      4344300
      9    4294970232    2012       1      4344300
     10    4294979802    2010       1      4206400
     11    4294979802    2011       2      8870290
     12    4294979802    2012       2      8870290
    

    If the data is large and performance is an issue then you could do the following changes to improve performance.

    Make STEP1 a VIEW instead of DATASET.

    data step1 / view=step1 ;
    

    Use BY statement for PERM_ID, which means the original data must be sorted by PERM_ID.

    proc summary data=step1 nway;
      by perm_id ;
      class year;
      var amount;
      output out=want(drop=_type_) sum=;
    run;