Search code examples
rdataframefillna

Replace NA values in dataframe starting in varying columns


This is a variation on the NA theme that I have not been able to find an answer to. I have monthly observations by column with a large number of series by row. Some missing values are genuine but some should be zero. I want to replace missing values for a given series with zeros but only after a value for that series has been observed.

For example, given:

   Mth1 Mth2 Mth3 Mth4
1    1    2    1    3
2    NA   3    2    1
3    NA   2    1   NA
4    NA   NA   2   NA
5    2    2    NA   2

I want to change this to:

   Mth1 Mth2 Mth3 Mth4
1    1    2    1    3
2    NA   3    2    1
3    NA   2    1    0
4    NA   NA   2    0
5    2    2    0    2

I want something like the locf function, which is able to leave missing values prior to the first positive observation, but I want to fill with zeros rather than use the last observation.


Solution

  • Here is another base R method using matrix indexing:

    df[is.na(df) & t(apply(!is.na(df), 1, cummax))] <- 0
    df
      Mth1 Mth2 Mth3 Mth4
    1    1    2    1    3
    2   NA    3    2    1
    3   NA    2    1    0
    4   NA   NA    2    0
    5    2    2    0    2
    

    is.na(df) returns a logical matrix indicating the location of NA values. This is (logically) chained to t(apply(!is.na(df), 1, cummax)) which indicates if a non-NA value occurred in a previous row element. elements of the data.frame for which both of these are TRUE are replaced with 0.