This is a variation on the NA theme that I have not been able to find an answer to. I have monthly observations by column with a large number of series by row. Some missing values are genuine but some should be zero. I want to replace missing values for a given series with zeros but only after a value for that series has been observed.
For example, given:
Mth1 Mth2 Mth3 Mth4
1 1 2 1 3
2 NA 3 2 1
3 NA 2 1 NA
4 NA NA 2 NA
5 2 2 NA 2
I want to change this to:
Mth1 Mth2 Mth3 Mth4
1 1 2 1 3
2 NA 3 2 1
3 NA 2 1 0
4 NA NA 2 0
5 2 2 0 2
I want something like the locf
function, which is able to leave missing values prior to the first positive observation, but I want to fill with zeros rather than use the last observation.
Here is another base R method using matrix indexing:
df[is.na(df) & t(apply(!is.na(df), 1, cummax))] <- 0
df
Mth1 Mth2 Mth3 Mth4
1 1 2 1 3
2 NA 3 2 1
3 NA 2 1 0
4 NA NA 2 0
5 2 2 0 2
is.na(df)
returns a logical matrix indicating the location of NA values. This is (logically) chained to t(apply(!is.na(df), 1, cummax))
which indicates if a non-NA value occurred in a previous row element. elements of the data.frame for which both of these are TRUE are replaced with 0.