Search code examples
countsassas-iml

Countifs in SAS


I have a SAS dataset with 3 columns. A FirmIndex, a ProducIndex and a third column called PrChange. Within each FirmIndex & ProductIndex group I want to count how many PrChange are different from . and from 0, and add that to a column called Number. Then I want to divide that column Number by the number of observations within each group which are not ..

Below an example of the dataset and desired output.

data prod;
input firmindex productindex PrChange Number Fract;
cards;

1   1   .   1   0.250
1   1   0.00    1   0.250
1   1   0.00    1   0.250
1   1   -0.40   1   0.250
1   1   0.00    1   0.250
1   2   .   2   1.000
1   2   1.00    2   1.000
1   2   0.30    2   1.000
1   3   .   4   0.800
1   3   0.70    4   0.800
1   3   1.00    4   0.800
1   3   0.70    4   0.800
1   3   0.00    4   0.800
1   3   -0.30   4   0.800
1   4   .   5   1.000
1   4   0.20    5   1.000
1   4   -1.00   5   1.000
1   4   -0.90   5   1.000
1   4   -0.50   5   1.000
1   4   1.00    5   1.000
2   1   .   2   1.000
2   1   0.30    2   1.000
2   1   -0.50   2   1.000
2   2   .   5   0.714
2   2   0.30    5   0.714
2   2   0.10    5   0.714
2   2   0.00    5   0.714
2   2   0.00    5   0.714
2   2   0.80    5   0.714
2   2   -0.20   5   0.714
2   2   0.40    5   0.714
2   3   .   1   1.000
2   3   0.60    1   1.000
2   4   .   5   0.714
2   4   -1.00   5   0.714
2   4   0.80    5   0.714
2   4   -0.20   5   0.714
2   4   0.00    5   0.714
2   4   0.00    5   0.714
2   4   -0.70   5   0.714
2   4   0.90    5   0.714
2   5   .   3   1.000
2   5   0.90    3   1.000
2   5   -0.70   3   1.000
2   5   -0.50   3   1.000
;
run;

Here is what I tried to generate the column number, but it is not working:

data work.prod;
    set work.prod;
    by firmindex productindex;
    if first.productindex  or first.firmindex then sum = 0;
        else if PrChange ne 0 and PrChange ne .;
        sum = sum + 1;
run;

Solution

  • Your problem here is that you need the number to divide by prior to running the rows of data. This is where SAS is different from Excel; SAS is row-based, meaning it takes your code and runs it against each row of data (more or less) one at a time, rather than dynamically looking at every cell from every other cell (like Excel). Much faster and more efficient but less flexible for stuff like this.

    Your particular question begs for a DoW loop. This takes over the normal data step loop and performs its own loop - twice. Once to calculate the number/fract values, then once to copy those to the BY group. Note I only check for last.productIndex; last/first transitions are always set on a second by variable when they're true for the first by variable.

    Here we do the first loop once for the first set of values( the first 5 records) then we re-loop through the same 5 records. Then for the next 3. Etc. Each time the two loops take the same number of rows so they always stay in sync.

    data want;
      do _n_ = 1 by 1 until (last.productIndex);
        set have;
        by firmindex productindex;
        number_denom = sum(number_Denom,not missing(PrChange));
        number       = sum(number, not (PrChange in (.,0)));
      end;
      fract = number/number_denom;
      do _n_ = 1 by 1 until (last.productIndex);
        set have;
        by firmindex productindex;
        output;
      end;
    run;