Search code examples
sascountiffrequency-distribution

SAS countif function without using proc


I need to do a frequency distribution of this one column data without using any proc freq; proc sql. I'm only allowed to use proc sort.

In excel I would use a simple countif, but I don't know how to do this in SAS given above contraint.

data sample_grades;
input grades $;
datalines;
C
A
A
B
B+
A-
W
A
A-
A
A-
A
B+
A-
A
B+
B+
A-
B+
;
run;

I came up with this but it stopped counting at A-

data new_dataset;
set Fall2016;
by grade;
retain grade frequency;
if grade = 'A' then frequency+1;
else if grade = 'A-' then frequency=0;
if grade = 'A-' then frequency+1;
else if grade = 'B' then frequency=0;
if grade = 'B' then frequency+1;
else if grade = 'B+' then frequency=0;
if grade = 'B+' then frequency+1;
else if grade = 'B-' then frequency=0;
if grade = 'B-' then frequency+1;
else if grade = 'C' then frequency=0;
if grade = 'C' then frequency+1;
else if grade = 'W' then frequency=0;
if grade = 'W' then frequency+1;
else frequency+0;
if last.grade then do;
frequency+0;
end;
run;

Ultimately I'm looking to a simple table like this: enter image description here


Solution

  • It helps to think of data steps as loops, that run through the input dataset and pick up values as they go. I was going to explain how your attempt worked in that respect, but it quickly became confusing. Here's my attempt at the problem:

    data sample_grades;
    input grades $;
    datalines;
    C
    A
    A
    B
    B+
    A-
    W
    A
    A-
    A
    A-
    A
    B+
    A-
    A
    B+
    B+
    A-
    B+
    ;
    run;
    

    Sort the data by grades first, so that BY-GROUP processing can take place:

    proc sort data=sample_grades;
      by grades;
    run;
    

    Now set up your data step as follows:

    data new_dataset;
      set sample_grades;
      by grades;
      /* If it's the first of the grades then set the frequency to zero */
      if first.grades then frequency=0;
      /* Increment the frequency value regardless of the value of grades */
      frequency+1;
      /* When the last of the grades values is found, output. This gives the total frequency for the grade in the output table */
      if last.grades then output;
    run;