Search code examples
sasmodemissing-data

Fill in missing values with mode in SAS


I think the logic to replace missingness is quite clear but when I dump it to SAS I find it too complicated to start with.


Solution

  • Given no code was provided, I'll give you some rough directions to get you started, but put it on you to determine any specifics.

    First, lets create a month column for the data and then calculate the modes for each key for each month. Additionally, lets put this new data in its own dataset.

    data temp;
       set original_data;
       month = month(date);
    run;
    
    proc univariate data=temp modes;
       var values;
       id key month;
       out=mode_data;
    run;
    

    However, this procedure calculates the mode in a very specific way that you may not want (defaults to the lowest in the case of a tie and produces no mode if nothing occurs at least twice) Documentation: http://support.sas.com/documentation/cdl/en/procstat/63104/HTML/default/viewer.htm#procstat_univariate_sect027.htm

    If that doesn't work for you, I recommend using proc sql to get a count of each key, month, value combination and calculating your own mode from there.

    proc sql;
       create table mode_data as select distinct
       key, month, value, count(*) as distinct_count
       from temp
       group by key, month, value;
    quit;
    

    From there you might want to create a table containing all months in the data.

    proc sql;
       create table all_months as select distinct month
       from temp;
    quit;
    

    Don't forget to merge back in any missing months from to the mode data and use the lag or retain functions to search previous months for "old modes".

    Then simply merge your fully populated mode data back to the the temp dataset we created above and impute the missing values to the mode when value is missing (i.e. value = .)

    Hope that helps get you started.