Objective: Go from Have table + Help table to Want table. The current implementation (below) is slow. I believe this is a good example of how not to use SAS Macros, but I'm curious as to whether... 1. the macro approach could be salvaged / made fast enough to be viable (e.g. proc append is supposed to speed up the action of stacking datasets, but I was unable to see any performance gains.) 2. what all the alternatives would look like.
I have written a non-macro solution that I will post below for comparison sake.
Data:
data have ;
input name $ term $;
cards;
Joe 2000
Joe 2002
Joe 2008
Sally 2001
Sally 2003
; run;
proc print ; run;
data help ;
input terms $ ;
cards;
2000
2001
2002
2003
2004
2005
2006
2007
2008
; run;
proc print ; run;
data want ;
input name $ term $ status $;
cards;
Joe 2000 here
Joe 2001 gone
Joe 2002 here
Joe 2003 gone
Joe 2004 gone
Joe 2005 gone
Joe 2006 gone
Joe 2007 gone
Joe 2008 here
Sally 2001 here
Sally 2002 gone
Sally 2003 here
; run;
proc print data=have ; run;
I can write a little macro to get me there for each individual:
%MACRO RET(NAME);
proc sql ;
create table studtermlist as
select distinct term
from have
where NAME = "&NAME"
;
SELECT Max(TERM) INTO :MAXTERM
FROM HAVE
WHERE NAME = "&NAME"
;
SELECT MIN(TERM) INTO :MINTERM
FROM HAVE
WHERE NAME = "&NAME"
;
CREATE TABLE TERMLIST AS
SELECT TERMS
FROM HELP
WHERE TERMS BETWEEN "&MINTERM." and "&MAXTERM."
ORDER BY TERMS
;
CREATE TABLE HEREGONE_&Name AS
SELECT
A.terms ,
"&Name" as Name,
CASE
WHEN TERMS EQ TERM THEN 'Here'
when term is null THEN 'Gone'
end as status
from termlist a left join studtermlist b
on a.terms eq b.term
;
quit;
%MEND RET ;
%RET(Joe);
%RET(Sally);
proc print data=HEREGONE_Joe; run;
proc print data=HEREGONE_Sally; run;
But it's incomplete. If I loop through for (presumably quite a few names)...
*******need procedure for all names - grab info on have ;
proc sql noprint;
select distinct name into :namelist separated by ' '
from have
; quit;
%let n=&sqlobs ;
%MACRO RETYA ;
OPTIONS NONOTEs ;
%do i = 1 %to &n ;
%let currentvalue = %scan(&namelist,&i);
%put ¤tvalue ;
%put &i ;
%RET(¤tvalue);
%IF &i = 1 %then %do ;
data base; set HEREGONE_¤tvalue; run;
%end;
%IF &i gt 1 %then %do ;
proc sql ; create table base as
select * from base
union
select * from HEREGONE_¤tvalue
;
drop table HEREGONE_¤tvalue;
quit;
%end;
%end ;
OPTIONS NOTES;
%MEND;
%RETYA ;
proc sort data=base ; by name terms; run;
proc print data=base; run;
So now I have want, but with 6,000 names, it takes over 20 minutes.
Let's try the alternative solution. For each name find the min/max term via a proc SQL data step. Then use a data step to create the time period table and merge that with your original table.
*Sample data;
data have ;
input name $ term ;
cards;
Joe 2000
Joe 2002
Joe 2008
Sally 2001
Sally 2003
; run;
*find min/max of each name;
proc sql;
create table terms as
select name, min(term) as term_min, max(term) as term_max
from have
group by name
order by name;
quit;
*Create table with the time periods for each name;
data empty;
set terms;
do term=term_min to term_max;
output;
end;
drop term_min term_max;
run;
*Create final table by merging the original table with table previously generated;
proc sql;
create table want as
select a.name, a.term, case when missing(b.term) then 'Gone'
else 'Here' end as status
from empty a
left join have b
on a.name=b.name
and a.term=b.term
order by a.name, a.term;
quit;
EDIT: Now looking at your macro solution, part of the problem is that you're scanning your table too many times.
Rather than loop through to append the data, take advantage of a naming convention and use a single data step to append all outputs.
%MACRO RET(NAME);
proc sql noprint;
SELECT MIN(TERM), Max(TERM) INTO :MINTERM, :MAXTERM
FROM HAVE
WHERE NAME = "&NAME"
;
CREATE TABLE _HG_&Name AS
SELECT
A.terms ,
"&Name" as Name,
CASE
WHEN TERMS EQ TERM THEN 'Here'
when term is null THEN 'Gone'
end as status
from help a
left join have b
on a.terms eq b.term
and b.name="&name"
where a.terms between "&minterm" and "&maxterm";
;
quit;
%MEND RET ;
*call macro;
proc sort data=have;
by name term;
run;
data _null_;
set have;
by name;
if first.name then do;
str=catt('%ret(', name, ');');
call execute(str);
end;
run;
*append results;
data all;
set _hg:;
run;