Search code examples
sumsasoutputretain

SAS: How to use RETAIN statement to create a summed variable in the DATA step, equivalent to the SUM statement output in PROC PRINT


In SAS, I'm trying to create a variable that is the sum of another. In this case, I am trying to create two variables: Total_All_Ages, which is the sum of the 2013 US population POPESTIMATE2013, and Total_18Plus, which is the sum of the 2013 US population aged 18+ POPEST18PLUS2013.

I want the output of these variables to appear as though I had used the sum statement under proc print (where the sum appears at the bottom of the variable column in a new row). However, I do not want to use the print procedure. Instead, I want to create my output only using the data step.

The way I need to do this is with the retain (and input) statement.

My code is as follows:

data _NULL_;
retain Total_All_Ages Total_18Plus;
infile RAWfoldr DLM=',' firstobs=3 obs=53;
informat STATE $2. NAME $20.;
input SUMLEV REGION $ DIVISION STATE $ NAME $ POPESTIMATE2013 POPEST18PLUS2013 PCNT_POPEST18PLUS;
    Total_All_Ages = sum(Total_All_Ages, POPESTIMATE2013);
    Total_18Plus = sum(Total_18Plus, POPEST18PLUS2013);
keep STATE NAME POPESTIMATE2013 POPEST18PLUS2013 Total_All_Ages Total_18Plus;
format POPESTIMATE2013 comma11. POPEST18PLUS2013 comma11.;
file print notitles;
if _n_=1 then put '=== U.S. Resident Population Estimates for All Ages and ===
                  Ages 18 or Older by State (in Alphabetical Order), 2013';
if _n_=1 then put ' ';
if _n_=1 then put @5 'FIPS Code' @16 'State Name' @40 'All Ages' @55 'Ages 18 or Older';
if _n_=1 then put ' ';
put @5 STATE @16 NAME @40 POPESTIMATE2013 @55 POPEST18PLUS2013;
run;

You can see that in my input statement, I create the two variables that I mentioned. I also mention them in my retain statement. However, I'm not sure how to make them appear in my output in the way I specified.

I want them to appear as a Total line at the bottom of the output, like this:

                                                        POPESTIMATE2013  POPEST18PLUS2013
                                                        112312234        1234123412341234
                                                        23413412341234   213412341234



                       ============                      ============     ============
                       Total                             23423423429      242234545345 

Is there a way to put these variables on a new line at the very bottom of the output (sort of like how I put the variable labels using the if _n_=1 code)?

Let me know if I need to explain myself better. I appreciate any help with this. Thank you.


Solution

  • If I understand your question, you're almost there.

    First, add end=eof to your infile statement. This initializes a variable "eof" that is equal to 0, but will equal 1 only when SAS is reading in the last line of data. This works in a set statement as well.

    Next, add this do block, which will execute when sas is on the last line of the file:

      if eof then do;
        put @5 9*'=' @40 11*'=' @55 11*'=';
        put @5 'Total' @40 Total_All_Ages comma11. @55 Total_18Plus comma11.;
      end;
    

    Here, you use put statements to print out the formatting (repeated ='s signs) and the totals. Complete code is below:

    data _NULL_;
      retain Total_All_Ages Total_18Plus;
      infile RAWfoldr DLM=',' firstobs=3 obs=53 end=eof;
      informat STATE $2. NAME $20.;
      input SUMLEV REGION $ DIVISION STATE $ NAME $ POPESTIMATE2013 POPEST18PLUS2013 PCNT_POPEST18PLUS;
        Total_All_Ages = sum(Total_All_Ages, POPESTIMATE2013);
        Total_18Plus = sum(Total_18Plus, POPEST18PLUS2013);
      keep STATE NAME POPESTIMATE2013 POPEST18PLUS2013 Total_All_Ages Total_18Plus;
      format POPESTIMATE2013 comma11. POPEST18PLUS2013 comma11.;
      file print notitles;
      if _n_=1 then put '=== U.S. Resident Population Estimates for All Ages and ===
                        Ages 18 or Older by State (in Alphabetical Order), 2013';
      if _n_=1 then put ' ';
      if _n_=1 then put @5 'FIPS Code' @16 'State Name' @40 'All Ages' @55 'Ages 18 or Older';
      if _n_=1 then put ' ';
      put @5 STATE @16 NAME @40 POPESTIMATE2013 comma11. @55 POPEST18PLUS2013 comma11.;
      if eof then do;
        put @5 9*'=' @40 11*'=' @55 11*'=';
        put @5 'Total' @40 Total_All_Ages comma11. @55 Total_18Plus comma11.;
      end;
    run;
    

    One final note on your code: you can right-align your numbers by specifying a format followed by "-r" in your put statement, e.g.:

      put @5 STATE @16 NAME @40 POPESTIMATE2013 comma11.-r @55 POPEST18PLUS2013 comma11.-r;
    

    This will override any format statement you have.