Search code examples
sas

Could someone please explain what the following lines of this code do?


This is the code that I have with me:

%let data=sashelp.cars;

    proc transpose data=&data(obs=0) out=names;
       var _all_;
       run;
    proc sql;
       select cats('_',_name_,'=missing(',_name_,');') into: stmts separated by ' ' from names;
       run;
    
    data missing;
       set &data;
       _BIGN = 1;
       /*%m_expand_varlist(data=&data,expr=cats('_',_name_,'=missing(',_name_,');'));*/
       &stmts;
       keep _:;
       run;
    proc summary data=missing;
       var _numeric_;
       output out=smry sum=;
       run;
    proc transpose data=smry(drop=_type_ _freq_) out=smry_;
       run;
   

The goal of this code is to output the number of missing values for both character and numeric variables in the data set. The code accomplishes that objective but I have difficulty in understanding the purpose of certain lines in the code.

May I know what the following part of the code does?

 select cats('_',_name_,'=missing(',_name_,');') into: stmts separated by '  ' from names;

I understand that the into part just stores the value into the macro variable stmts but what does "separated by ' ' from names" in the above line mean?

data missing;
       set &data;
       _BIGN = 1;
       &stmts;
       keep _:;
       run;

And in the above portion of the code, what is the purpose of "keep :"? What does the ":" do in that? And is the "_BIGN = 1" necessary?

And also in the final output table called smry_, I get underscores before the names of the variables. But I don't need these underscores. What can I do to remove them? When I removed the underscore after the "keep :", the underscores in the smry table went away but I was left with only 10 rows instead of 15. Help would be appreciated. Thank you.


Solution

  • Before answering your questions, let me disclaim that this is not the way to go if you simply want a count of missing values for each numeric variable in your data.

    However, this seems to be more of a practice assignment than an actual problem.

    1. The Separated By Clause simply inserts the specific string between the values in the macro variable if the data source has >1 items. In this case the names data set has 15 items, so all 15 values are listed with a few spaces between them.
    2. The colon operator in the keep statement tells the data step to keep only variables prefixed with an underscore.
    3. _BIGN is not strictly necessary. However, it seems that the author of the code wants a simple count of observations in the final data set. That is all it does.
    4. The underscores are applied to each variable name in the creation of the macro variable in. It is probably done to avoid conflicts between variable names (though this is technically still possible). Obviously, you can simply remove the underscore in the final data set.