I have the following data frame:
library(tidyverse)
v1 = c(1 , NA, 3, 5, NA, NA, 2, 12, NA, 5, NA, 0, 1, 2, 6, 8)
alt = rnorm(length(v1), 0, 1)
tb = tibble(v1, alt)
print(tb)
A tibble: 16 × 2
v1 alt
<dbl> <dbl>
1 1 0.495
2 NA 0.726
3 3 0.667
4 5 0.955
5 NA -1.68
6 NA -1.21
7 2 -1.96
8 12 1.47
9 NA 0.372
10 5 1.07
11 NA 0.531
12 0 0.102
13 1 1.34
14 2 0.0872
15 6 -0.391
16 8 -0.250
I need to fill NAs
in v1
using the mutate. The idea is that when there is one NA
in v1
, it will be filled by the multiplication between the variable alt
and the value of v1
prior to the NA.
I solve this using loop for
, but this may take time to depend on the data set.
for (i in 1:length(v1)) {
if( is.na(tb[i, 'v1']) ){
tb[i, 'v1'] = tb[i-1, 'v1']*tb[i, 'alt']
}
}
This yields:
A tibble: 16 × 2
v1 alt
<dbl> <dbl>
1 1 0.495
2 0.726 0.726
3 3 0.667
4 5 0.955
5 -8.38 -1.68
6 10.1 -1.21
7 2 -1.96
8 12 1.47
9 4.47 0.372
10 5 1.07
11 2.65 0.531
12 0 0.102
13 1 1.34
14 2 0.0872
15 6 -0.391
16 8 -0.250
My question is: How do I fill NAs
using my condition and mutate
or others dplyr
verbs ?
You could make use of cumprod
:
tb %>%
group_by(id = cumsum(!is.na(v1))) %>%
mutate(v1 = v1[1]*cumprod(alt^is.na(v1)))
# A tibble: 16 x 3
# Groups: id [11]
v1 alt id
<dbl> <dbl> <int>
1 1 0.495 1
2 0.726 0.726 1
3 3 0.667 2
4 5 0.955 3
5 -8.4 -1.68 3
6 10.2 -1.21 3
7 2 -1.96 4
8 12 1.47 5
9 4.46 0.372 5
10 5 1.07 6
11 2.66 0.531 6
12 0 0.102 7
13 1 1.34 8
14 2 0.0872 9
15 6 -0.391 10
16 8 -0.25 11
In Base R:
prd <- function(a, b){
ifelse(is.na(a), prd(b * c(1,head(a,-1)),b), a)
}
transform(tb, v1 = prd(v1, alt))
v1 alt
1 1.000 0.4950
2 0.726 0.7260
3 3.000 0.6670
4 5.000 0.9550
5 -8.400 -1.6800
6 10.164 -1.2100
7 2.000 -1.9600
8 12.000 1.4700
9 4.464 0.3720
10 5.000 1.0700
11 2.655 0.5310
12 0.000 0.1020
13 1.000 1.3400
14 2.000 0.0872
15 6.000 -0.3910
16 8.000 -0.2500