Similar to R - Replace values starting in selected column by row, I want to replace any zero values occuring between starting and ending columns with the preceding non-zero value, by row. Given data:
df <- structure(list(Mth1 = c(0L, 0L, 5L, 0L, 2L),
Mth2 = c(2L, 3L, 2L, 2L, 0L),
Mth3 = c(0L, 2L, 0L, 0L, 3L),
Mth4 = c(3L, 0L, 0L, 4L, 0L),
StartMth = c(2L, 2L, 1L, 2L, 1L),
EndMth = c(4L, 3L, 3L, 4L, 3L)),
.Names = c("Mth1", "Mth2", "Mth3", "Mth4", "StartMth", "EndMth"), class = "data.frame",
row.names = c("1", "2", "3", "4", "5"))
> df
Mth1 Mth2 Mth3 Mth4 StartMth EndMth
1 0 2 0 3 2 4
2 0 3 2 0 2 3
3 5 2 0 0 1 3
4 0 2 0 4 2 4
5 2 0 3 0 1 3
I would like to use the values in StartMth and EndMth to determine where the replacements occur. The desired output is:
> df1
Mth1 Mth2 Mth3 Mth4
1 0 2 2 3
2 0 3 2 0
3 5 2 2 0
4 0 2 2 4
5 2 2 3 0
Note, the values in the designated StartMth and EndMth columns will always be non-zero, and the values before StartMth and the values after EndMth will always be zero. There may be zero, one or more zero values between the StartMth and EndMth columns that need replacing.
Edit: the actual dataframe has many more months and rows and will grow as time passes so an efficient and generic solution is required. StartMth and EndMth will always be the last columns.
I suspect the solution involves use of an apply
variant, with perhaps an intermediate change to NA followed by the application of locf
to fill the now missing value with the preceding one.
This seems to work but could you check?
t(apply(X = df, MARGIN = 1, function(a)
replace(x = a[1:4],
list = (which(a[a[5]:a[6]] == 0) + a[5] - 1),
values = a[a[5]:a[6]][which(a[a[5]:a[6]] == 0)[1] - 1])))
# Mth1 Mth2 Mth3 Mth4
#1 0 2 2 3
#2 0 3 2 0
#3 5 2 2 0
#4 0 2 2 4
#5 2 2 3 0