Search code examples
sqlsasprocsummary

SAS determine first and last non-missing ID / date by class for each variable


I want a report of all the variables in my longitudinal/panel data set by city. PROC MEANS has been great for this, but this cannot handle the inclusion of columns stating the first and last date for which the variable in question is nonmissing.

My data looks something like this:

data have;
input date $ city $ var1 var2 var3;
      datalines;
2006Q1  NYC   .  0.1   4
2006Q2  NYC   .  0.48  6
2006Q3  NYC   .  0.55  7
2006Q4  NYC   5  0.33  2
2007Q1  NYC   8    .  10
2007Q3  NYC  12    .   15
2006Q1  BOS  11  0.6   .
2006Q2  BOS   6  0.81  .
2006Q3  BOS   1  0.66  9
2006Q4  BOS   2  0.5   1
2007Q1  BOS   4  0.61  8
2007Q3  BOS  19  0.4   1
;
run;

My desired output is a table that looks something like this:

City Variable  First Nonmiss  Last Nonmiss
 BOS    var1   2006Q1         2007Q3 
 BOS    var2   2006Q1         2007Q3 
 BOS    var3   2006Q3         2007Q3 
 NYC    var1   2006Q4         2007Q3 
 NYC    var2   2006Q1         2006Q4
 NYC    var3   2006Q1         2007Q3

Perhaps a PROC TRANSPOSE with a create table in PROC SQL is the best way to do this, but I'm open to suggestions -- and I'm not really certain where to start.

Thanks!


Solution

  • I would definitely do a proc transpose approach here; it's very straightforward. This takes advantage of the fact that first. and last. are defined after the where clause is implemented.

    proc sort data=have;
      by city date;
    run;
    
    proc transpose data=have out=have_t;
      by city date;
      var var1-var3;
    run;
    
    proc sort data=have_t;
      by city _name_ date;
    run;
    
    data want;
      set have_t;
      by city _name_ date;
      retain first_nonmiss;
      where not missing(col1);
      if first._name_ then do;
        first_nonmiss = date;
      end;
      if last._name_ then do;
        last_nonmiss = date;
        output;
        call missing(of first_nonmiss);  *I like to do this for safety even though it is irrelevant here - later code might make it relevant;
      end;
    run;