Search code examples
sasmissing-dataenterprise-guide4gl

How to fill missing values in numeric columns by using 0 in one table base on existing these columns in other tables in SAS Enterprise Guide?


I have tables in SAS Enterprise Guide like below:

TABLE 1

COL1 | COL2 | ...  | COLn
-----|------|------|-------
123  |      | ...  | xxx
     | AAA  | ...  | xxx
122  | BCC  | ...  | xxx
...  | ...  | ...  | xxx

And also 2 other tables like below:

TABLE 2

COL1 | ...| COLn
-----|----|------
998  | ...| xxx
999  | ...| xxx
001  | ...| xxx
...  | ...| ...

TABLE 3

COL8 | ...| COLn
-----|----|------
117  | ...| xxx
906  | ...| xxx
201  | ...| xxx
...  | ...| ...

As you can see TABLE 1 has missing values and I need to:

  • fill all missing values of NUMERIC variables by using 0 in each column of TABLE 1 if this column also exists in TABLE 2 or TABLE 3

So, as a result I need something like below, because COL1 (where missing is fill by 0) is in TABLE 2 and COL2 is neither in TABLE 2 nor in TABLE 3, moreover COL2 is not numeric

COL1 | COL2 | ...  | COLn
-----|------|------|-------
123  |      | ...  | xxx
0    | AAA  | ...  | xxx
122  | BCC  | ...  | xxx
...  | ...  | ...  | xxx

Only TABLE 1 is relevant, if some column from TABLE 1 is numeric, has missing and exists in TABLE 2 or TABLE 3 fill this column by 0

How can I do that in SAS Enterprise Guide ?


Solution

  • Create sample data sets

    data tab1;
    infile datalines delimiter='|' dsd;
    input col1 col2 $ col3 col4 $ col5 col6;
    datalines;
    1|ABC|.|ABC|5|9
    .|DEF|8||10|.
    ;
    run;
    
    data tab2;
    format col1 8. col2 $8. col5 8.;
    stop;
    run;
    
    data tab3;
    format col3 8. col4 $8.;
    stop;
    run;
    
    col1 col2 col3 col4 col5 col6
     1    ABC  .    ABC   5   9
     .    DEF  8         10   .
    

    If you are only interested in the number of missing values for numeric variables, then a single call to the MEANS procedure computes the answer.
    We use the ods output statement to output the results to a SAS table tofill. We use the stackods option, it allows the data set to resemble the default printed output from PROC MEANS.

    proc means data=tab1 nmiss n stackods;
        ods output summary=tofill(where=(nmiss>0));
    run;
    

    Then we use the same kind of approach as your last questions here and here to match retrieve the columns that exists in either TAB2 or TAB3. We fill the column names in a macro variable tofill

    proc sql noprint;
        create table names as 
        select distinct upcase(name) as name
        from sashelp.vcolumn
        where memname in ('TAB2', 'TAB3');
    
        select upcase(variable) into :tofill separated by ' '
        from tofill
        where upcase(variable) in (select name from names)
        ;
    quit;
    

    Finally, we replace the missing values of the columns we are interested in by 0

    data want;
        set tab1;
        array cols &tofill.;
        do over cols;
            if missing(cols) then cols = 0;
        end;
    run;
    
    col1 col2 col3 col4 col5 col6
     1    ABC   0  ABC   5    9
     0    DEF   8       10    .
    

    As desired

    • missing values of col1 and col3 are replaced by 0 because they are both numeric columns and exists in either tab2 or tab3
    • missing value of col4 remain missing as it is a character variable
    • missing value of col6 remain missing as this column does not exist in either tab2 or tab3.