I want a report of all the variables in my longitudinal/panel data set by city. PROC MEANS
has been great for this, but this cannot handle the inclusion of columns stating the first and last date for which the variable in question is nonmissing.
My data looks something like this:
data have;
input date $ city $ var1 var2 var3;
datalines;
2006Q1 NYC . 0.1 4
2006Q2 NYC . 0.48 6
2006Q3 NYC . 0.55 7
2006Q4 NYC 5 0.33 2
2007Q1 NYC 8 . 10
2007Q3 NYC 12 . 15
2006Q1 BOS 11 0.6 .
2006Q2 BOS 6 0.81 .
2006Q3 BOS 1 0.66 9
2006Q4 BOS 2 0.5 1
2007Q1 BOS 4 0.61 8
2007Q3 BOS 19 0.4 1
;
run;
My desired output is a table that looks something like this:
City Variable First Nonmiss Last Nonmiss
BOS var1 2006Q1 2007Q3
BOS var2 2006Q1 2007Q3
BOS var3 2006Q3 2007Q3
NYC var1 2006Q4 2007Q3
NYC var2 2006Q1 2006Q4
NYC var3 2006Q1 2007Q3
Perhaps a PROC TRANSPOSE
with a create table
in PROC SQL
is the best way to do this, but I'm open to suggestions -- and I'm not really certain where to start.
Thanks!
I would definitely do a proc transpose
approach here; it's very straightforward. This takes advantage of the fact that first.
and last.
are defined after the where
clause is implemented.
proc sort data=have;
by city date;
run;
proc transpose data=have out=have_t;
by city date;
var var1-var3;
run;
proc sort data=have_t;
by city _name_ date;
run;
data want;
set have_t;
by city _name_ date;
retain first_nonmiss;
where not missing(col1);
if first._name_ then do;
first_nonmiss = date;
end;
if last._name_ then do;
last_nonmiss = date;
output;
call missing(of first_nonmiss); *I like to do this for safety even though it is irrelevant here - later code might make it relevant;
end;
run;