I have data in the following format
Startdate perm_id amount enddate
01JUL2004 4294970232. 787560 01JUL2009
10AUG2005 4294970232 797880 01JUL2009
31JAN2007 4294970232 4344300 31JAN2012
20JUL2010 4294979802 4206400 06JUL2012
08JUL2011 4294979802 4663890 07JUL2012
`
I am trying to obtain the years for all missing years between start_date and end_date for each perm_id?
So it should look something like below
startdate perm_id tranche_amount enddate
01JUL2004 4294970232 787560 01JUL2009
10AUG2005 4294970232 797880 01JUL2009
10AUG2006 4294970232 797880 01JUL2009
31JAN2008 4294970232 4344300 31JAN2012
31JAN2009 4294970232 4344300 31JAN2012
31JAN2010 4294970232 4344300 31JAN2012
31JAN2011 4294970232 4344300 31JAN2012
20JUL2010 4294979802 4206400 06JUL2012
08JUL2011 4294979802 4663890 07JUL2012
I have run the following code;
`your text`data loansdata;
`your text`select startdate, enddate and perm_id;
`your text`input startdate enddate perm_id date9.;
`your text`want=1;
`your text`do while (startdate <=year<= enddate);
output;
date=intnx('year', date, 1, 'b');
`end;
`format perm_id year tranche_amount date9.;
`run;
Appreciate comments;
Regards`
Seb
Sounds like you just want to output one record per year from the input data and then aggregate by ID and YEAR.
data have ;
input startdate :date. perm_id :$12. amount enddate :date.;
format startdate enddate date9.;
cards;
01JUL2004 4294970232 787560 01JUL2009
10AUG2005 4294970232 797880 01JUL2009
31JAN2007 4294970232 4344300 31JAN2012
20JUL2010 4294979802 4206400 06JUL2012
08JUL2011 4294979802 4663890 07JUL2012
;
data step1;
set have;
do year=year(startdate) to year(enddate);
output;
end;
keep perm_id amount year;
run;
proc summary data=step1 nway;
class perm_id year;
var amount;
output out=want(drop=_type_) sum=;
run;
Result
Obs perm_id year _FREQ_ amount
1 4294970232 2004 1 787560
2 4294970232 2005 2 1585440
3 4294970232 2006 2 1585440
4 4294970232 2007 3 5929740
5 4294970232 2008 3 5929740
6 4294970232 2009 3 5929740
7 4294970232 2010 1 4344300
8 4294970232 2011 1 4344300
9 4294970232 2012 1 4344300
10 4294979802 2010 1 4206400
11 4294979802 2011 2 8870290
12 4294979802 2012 2 8870290
If the data is large and performance is an issue then you could do the following changes to improve performance.
Make STEP1 a VIEW instead of DATASET.
data step1 / view=step1 ;
Use BY statement for PERM_ID, which means the original data must be sorted by PERM_ID.
proc summary data=step1 nway;
by perm_id ;
class year;
var amount;
output out=want(drop=_type_) sum=;
run;