Looking to update this macro to be HASH + point = key. We have started to exceed our memory limits with our current version of this macro for one of our data runs. The reason I'm asking for help is because I don't have a lot of time and have never really analyzed this code since it wasn't part of my process until recently.
What I don't really understand from, https://www.lexjansen.com/nesug/nesug11/ld/ld01.pdf, is how does the RID get set and how to incorporate it into our macro. I actually don't even know if it is possible to do it this way with our current macro.
Any help would be greatly appreciated.
%macro hashmerge2(varnm,onto,from,byvars,obsqty);
%let data_vars = %trim (&varnm);
%let data_vars_a = %sysfunc(tranwrd(&data_vars.,%str( ),%str(" , ")));
%let data_vars_b = %sysfunc(tranwrd(&data_vars.,%str( ), %str(,)));
%let data_key = %trim (&byvars);
%let data_key = %sysfunc(tranwrd(&data_key.,%str( ), %str(" , ")));
%if %index(&varnm,' ') > 0 %then %let varnm3=%substr(%substr(&varnm,1,%index(&varnm,' ')),1,4);
%else %let varnm3=%substr(&varnm,1,4);
data &onto(drop=rc) miss&varnm3(drop=rc);
if 0 then set &onto &from(keep=&varnm. &byvars.);
declare hash h_merge (dataset: "&from.");
rc = h_merge.DefineKey ("&data_key.");
rc = h_merge.DefineData ("&data_vars_a.");
rc = h_merge.DefineDone ();
do until (eof);
set &onto end = eof;
call missing(&data_vars_b.);
rc = h_merge.find ();
if rc = 0 then do;
output &onto;
from = "&from.";
end;
else do;
output miss&varnm3 &onto;
from = "&onto.";
end;
end;
stop;
run;
%mend;
So I think this is what you are looking for, but it still needs to load all of the key values from the "lookup" table into the hash object. But it could save space by instead of also loading the non-key variables it just needs to load the observation number that matches the key variables.
%macro hash_merge_point
/*-----------------------------------------------------------------------------
Merge variables ONTO large table FROM small table using POINT= dataset option.
-----------------------------------------------------------------------------*/
(varnm /* Space delimited list of variable to retrieve */
,onto /* Dataset to update */
,from /* Dataset to get values from */
,byvars /* Space delimited list of key variables to match on */
);
%local missds key_vars;
%let missds=%scan(&varnm,1,%str( ));
%let missds=miss%substr(&missds,1,%sysfunc(min(28,%length(&missds))));
%let key_vars="%sysfunc(tranwrd(%sysfunc(compbl(&byvars)),%str( )," "))";
data &onto(drop=rc) &missds(drop=rc);
if 0 then set &onto &from(keep=&varnm. &byvars.);
declare hash h_merge ();
rc = h_merge.DefineKey (&key_vars);
rc = h_merge.DefineData ('_point');
rc = h_merge.DefineDone ();
do _point=1 to _nobs;
set &from(keep=&byvars) point=_point nobs=_nobs;
rc = h_merge.add();
end;
do until (eof);
set &onto end = eof;
rc = h_merge.find ();
if rc = 0 then do;
set &from (keep=&varnm) point=_point;
from = "&from.";
output &onto;
end;
else do;
call missing(of &varnm);
from = "&onto.";
output ;
end;
end;
stop;
run;
%mend hash_merge_point;
So here is an trivial example:
data lookup;
input id age sex $1.;
cards;
1 10 F
2 20 .
4 30 M
;
data master ;
input id wt ;
cards;
1 100
2 150
3 180
4 200
;
%hash_merge_point
/*-----------------------------------------------------------------------------
Merge variables ONTO large table FROM small table using POINT= dataset option.
-----------------------------------------------------------------------------*/
(varnm=age sex /* Space delimited list of variable to retrieve */
,onto=master /* Dataset to update */
,from=lookup /* Dataset to get values from */
,byvars=id /* Space delimited list of key variables to match on */
);
If the target table already has the variables being created by the merge (so you just want to overwrite the current values) then you can use the MODIFY statement instead of the SET statement to modify the dataset in place. But you might want to make sure you have a backup of the table before trying this. Also note that if you want flag for the source, the from
variable, then that variable also needs to exist.
So with this updated master table:
data master ;
input id wt ;
length age 8 sex $1 from $50;
cards;
1 100
2 150
3 180
4 200
;
And this version of the macro:
%macro hash_merge_point
/*-----------------------------------------------------------------------------
Merge variables ONTO large table FROM small table using POINT= dataset option.
-----------------------------------------------------------------------------*/
(varnm /* Space delimited list of variable to retrieve */
,onto /* Dataset to update */
,from /* Dataset to get values from */
,byvars /* Space delimited list of key variables to match on */
);
%local key_vars;
%let key_vars="%sysfunc(tranwrd(%sysfunc(compbl(&byvars)),%str( )," "))";
data &onto;
if 0 then set &onto (keep=&byvars.);
declare hash h_merge ();
rc = h_merge.DefineKey (&key_vars);
rc = h_merge.DefineData ('_point');
rc = h_merge.DefineDone ();
do _point=1 to _nobs;
set &from(keep=&byvars) point=_point nobs=_nobs;
rc = h_merge.add();
end;
do until (eof);
modify &onto end = eof;
rc = h_merge.find ();
if rc = 0 then do;
set &from (keep=&varnm) point=_point;
from = "&from.";
end;
else from = "&onto.";
replace;
end;
stop;
run;
%mend hash_merge_point;
If you run this code:
proc print data=master;
title 'BEFORE';
run;
%hash_merge_point
/*-----------------------------------------------------------------------------
Merge variables ONTO large table FROM small table using POINT= dataset option.
-----------------------------------------------------------------------------*/
(varnm=age sex /* Space delimited list of variable to retrieve */
,onto=master /* Dataset to update */
,from=lookup /* Dataset to get values from */
,byvars=id /* Space delimited list of key variables to match on */
);
proc print data=master;
title 'AFTER';
run;
You get this result: