Search code examples
sas

Is there an easy way to see number of observations for each column in SAS?


I would like to achieve a similar outcome as pandas' info() in python. I want to see the number of elements for each column (missing / non-missing), like with proc contents, just for all of the variables. I'm sure there is a simple way, I just can't find it:) Could anyone help me with this?

Example for info( ):

df.info(verbose=True)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 3 columns):
 #   Column     Non-Null Count  Dtype
---  ------     --------------  -----
 0   int_col    5 non-null      int64
 1   text_col   5 non-null      object
 2   float_col  5 non-null      float64

Solution

  • I don't know what INFO() function produces (you should include an example in your question) but it sounds like you want the number of distinct values (LEVELS) for the variables. PROC FREQ can do that.

    proc freq nlevels data=sashelp.cars;
       tables _all_ / noprint;
    run;
    

    Result

    The FREQ Procedure
    
                         Number of Variable Levels
    
                                                   Missing    Nonmissing
    Variable       Label                Levels      Levels        Levels
    --------------------------------------------------------------------
    Make                                    38           0            38
    Model                                  425           0           425
    Type                                     6           0             6
    Origin                                   3           0             3
    DriveTrain                               3           0             3
    MSRP                                   410           0           410
    Invoice                                425           0           425
    EngineSize     Engine Size (L)          43           0            43
    Cylinders                                8           1             7
    Horsepower                             110           0           110
    MPG_City       MPG (City)               28           0            28
    MPG_Highway    MPG (Highway)            33           0            33
    Weight         Weight (LBS)            348           0           348
    Wheelbase      Wheelbase (IN)           40           0            40
    Length         Length (IN)              67           0            67
    

    If you want to count missing/non-missing values then make a pair of special formats and use proc freq. SAS only has two types of variables, numbers and fixed length character strings. So make two formats, one for each type.

    There is a worked example on SAS Communities : https://communities.sas.com/t5/SAS-Programming/Counting-missing-and-non-missing-obs/m-p/740272

    *set input/output dataset names;
    %let INPUT_DSN = sashelp.cars;
    %let OUTPUT_DSN = want;
    
    * create formats for missing;
    proc format;
      value $missfmt ' ' = "Missing"  other = "Not_Missing";
      value nmissfmt low-high ="Not_Missing" other="Missing";
    run;
    
    * turn off output and capture the one way freq table TEMP dataset ; 
    ods select none;
    ods table onewayfreqs=temp;
    
    proc freq data=&INPUT_DSN.;
      table _all_ / missing;
      format _numeric_ nmissfmt. _character_ $missfmt.;
    run;
    
    * turn outputs back on ;
    ods select all;
    
    * Collapse to one observation per variable ;
    data &OUTPUT_DSN ;
      length name $32 missing not_missing total 8 ;
      set temp;
      by table notsorted ;
      if first.table then call missing(of missing not_missing);
      name = substr(table,7);
      if vvaluex(name)='Missing' then missing=frequency;
      else not_missing=frequency;
      retain missing not_missing;
      if last.table then do;
         missing=sum(0,missing);
         not_missing=sum(0,not_missing);
         total=sum(missing,not_missing);
         percent = divide(missing,total);
         output;
      end;
      keep name missing not_missing total percent;
    run;
    

    Resulting data:

    enter image description here