Search code examples
arrayssassas-macro

how to select all combination of column names and do calculation on those columns in SAS


I have a dataset that contains 81 columns with person id, 79 binary variables and a cost variable:

id h1 h2 h3 ... h79  cost
1  1  0  1      1     15
2  1  1  1      1     80
3  0  1  1      0     10
...

each person id have one row of records. Now I want to choose which of the two h(binary) variables have more than 50 unique person id. if then do calculate their total cost. I guess a good way to approach it is to create a array with all h variables and use two DO LOOPS? But what if I want to see a group of three variables or maybe four or five? And also how am I going to store the combination of variable names so I could know this combination of variables has this amount of total cost. So I think the final output is going to looks like this:

combinations              total cost
h1&h3                       95
h2&h3                       90
h1&h2&h3.                   80

thank you for your help!


Solution

  • A DATA step can use the ALLCOMB or ALLCOMBI routines to iterate through the k-subset combinations of an array of size n. A hash can be use to accumulate the count and total cost of each specific k-subset asserting all true conditions.

    options mprint;
    
    data have (keep=id flag: cost);
      do id = 1 to 3;
        array flag(79) flag01-flag79;
        do i = 1 to dim(flag);
          flag(i) = ranuni(1) < 0.5;
        end;
        cost = ceil(10+100*ranuni(123));
        output;
      end;
    run;
    

    Example

    data _null_;
      if 0 then set have;* prep pdv;
    
      array x flag:;
      n = dim(x);
    
      k = 2; ways2 = comb(dim(x),k); put 'NOTE: ' n= k= ways2=;
      k = 3; ways3 = comb(dim(x),k); put 'NOTE: ' n= k= ways3=;
      k = 4; ways4 = comb(dim(x),k); put 'NOTE: ' n= k= ways4=;
      k = 5; ways5 = comb(dim(x),k); put 'NOTE: ' n= k= ways5=;
    
      array var(5) $32;
      length count cost_sum 8;
    
      declare hash all_true(hashexp:15, ordered:'A');
      all_true.defineKey('var1', 'var2', 'var3', 'var4', 'var5');
      all_true.defineData('var1', 'var2', 'var3', 'var4', 'var5', 'count', 'cost_sum');
      all_true.defineDone();
    
      do until (end);
        set have end=end;
        array f flag:;
    
        %macro track_all_true(K=);
    
          array index&K._[&K];
          call missing (of index&K._[*]);  %* reset search tracking variables;
          call missing (of var[*]);        %* reset search tracking variables;
    
          %* search all combinations for those that are all true;
          do p = 1 to comb(n,&K);
    
            call allcombi(n, &K, of index&K._[*], add, remove);
    
            %* check each item in the combination;
            do q = 1 to &K while(x[index&K._[q]]);
            end;
    
            if q > &K then do; %* each item was true;
              do q = 1 to &K;
                which_index = index&K._[q];
                which_var = vname( x[which_index] );
                var(q) = which_var;
              end;
    
              if all_true.find() ne 0 then do; %* track first occurrence of the combination;
                cost_sum = cost;
                count = 1;
                all_true.add();
              end;
              else do; %* accumulate count and cost information for the combination;
                cost_sum + cost;
                count + 1;
                all_true.replace();
              end;
            end;
    
          end;
    
        %mend;
    
        %track_all_true(K=2)
        %track_all_true(K=3)
    
        %track_all_true(K=4)
        %track_all_true(K=5)
    
      end;
    
      all_true.output(dataset:'count_cost');
      stop;
    run;