I use R and R package dplyr. I want in data frame fill missing values from previous row value (increase it by 1). It works well if I have a gap of one, but if there are 2 or more consecutive missing values then I have a problem. Let say here I have a data frame and for some rows it values are missing.
id val1 val2
1 0 0 0
2 1 1 1
3 2 NA NA
4 3 3 3
5 4 NA NA
6 5 NA NA
7 6 6 6
Here val2 is for which I perform the increase and val1 is just original value.
For NA values I want to take previous row value and increase it by 1.
I use dplyr package and lag function.
Here is the code:
library(dplyr)
tmp.data <- data.frame(id = c(0:6),val1 = c(0:6)) %>%
mutate(
val1 = ifelse( val1 %in% c(2,4:5),NA,val1),
val2 = val1,
)
tmp.data <- tmp.data %>%
mutate(
val2 = ifelse(
is.na(val2),
lag(val2) + 1,
val2
)
)
Solution 1a: Assuming increase from previous missing value(s), with 2 consecutive missing rows
Using dplyr
:
tmp.data %>%
mutate(val2 = ifelse(is.na(val2), lag(val2) + 1, val2),
val2 = ifelse(is.na(val2), lag(val2) + 1, val2))
Solution 1b: Assuming increase from previous missing value(s), with N consecutive missing rows
Using data.table
, zoo
and dplyr
:
setDT(tmp.data)[, consec := seq_len(.N), by=rleid(val2)]
tmp.data %>%
mutate(val2 = ifelse(is.na(val2), na.locf(val2) + consec, val2)) %>%
select(-consec)
Or written together:
tmp.data %>%
group_by(rleid(val2)) %>%
mutate(consec = seq_along(val2)) %>%
ungroup() %>%
mutate(val2 = ifelse(is.na(val2), na.locf(val2) + consec, val2)) %>%
select(id, val1, val2)
Solution 2: Not assuming increase from previous missing value(s)
Using dplyr
and zoo
:
tmp.data %>%
mutate(val2 = ifelse(is.na(val2), na.locf(val2) + 1, val2))