I have a dataset that contains 81 columns with person id, 79 binary variables and a cost variable:
id h1 h2 h3 ... h79 cost
1 1 0 1 1 15
2 1 1 1 1 80
3 0 1 1 0 10
...
each person id have one row of records. Now I want to choose which of the two h(binary) variables have more than 50 unique person id. if then do calculate their total cost. I guess a good way to approach it is to create a array with all h variables and use two DO LOOPS? But what if I want to see a group of three variables or maybe four or five? And also how am I going to store the combination of variable names so I could know this combination of variables has this amount of total cost. So I think the final output is going to looks like this:
combinations total cost
h1&h3 95
h2&h3 90
h1&h2&h3. 80
thank you for your help!
A DATA step can use the ALLCOMB
or ALLCOMBI
routines to iterate through the k-subset combinations of an array of size n. A hash can be use to accumulate the count and total cost of each specific k-subset asserting all true conditions.
options mprint;
data have (keep=id flag: cost);
do id = 1 to 3;
array flag(79) flag01-flag79;
do i = 1 to dim(flag);
flag(i) = ranuni(1) < 0.5;
end;
cost = ceil(10+100*ranuni(123));
output;
end;
run;
Example
data _null_;
if 0 then set have;* prep pdv;
array x flag:;
n = dim(x);
k = 2; ways2 = comb(dim(x),k); put 'NOTE: ' n= k= ways2=;
k = 3; ways3 = comb(dim(x),k); put 'NOTE: ' n= k= ways3=;
k = 4; ways4 = comb(dim(x),k); put 'NOTE: ' n= k= ways4=;
k = 5; ways5 = comb(dim(x),k); put 'NOTE: ' n= k= ways5=;
array var(5) $32;
length count cost_sum 8;
declare hash all_true(hashexp:15, ordered:'A');
all_true.defineKey('var1', 'var2', 'var3', 'var4', 'var5');
all_true.defineData('var1', 'var2', 'var3', 'var4', 'var5', 'count', 'cost_sum');
all_true.defineDone();
do until (end);
set have end=end;
array f flag:;
%macro track_all_true(K=);
array index&K._[&K];
call missing (of index&K._[*]); %* reset search tracking variables;
call missing (of var[*]); %* reset search tracking variables;
%* search all combinations for those that are all true;
do p = 1 to comb(n,&K);
call allcombi(n, &K, of index&K._[*], add, remove);
%* check each item in the combination;
do q = 1 to &K while(x[index&K._[q]]);
end;
if q > &K then do; %* each item was true;
do q = 1 to &K;
which_index = index&K._[q];
which_var = vname( x[which_index] );
var(q) = which_var;
end;
if all_true.find() ne 0 then do; %* track first occurrence of the combination;
cost_sum = cost;
count = 1;
all_true.add();
end;
else do; %* accumulate count and cost information for the combination;
cost_sum + cost;
count + 1;
all_true.replace();
end;
end;
end;
%mend;
%track_all_true(K=2)
%track_all_true(K=3)
%track_all_true(K=4)
%track_all_true(K=5)
end;
all_true.output(dataset:'count_cost');
stop;
run;