Search code examples
rdplyrnafillnamutate

Fill in NAs using mutate


I have the following data frame:

library(tidyverse)

v1 = c(1 , NA, 3, 5, NA, NA, 2, 12, NA, 5, NA, 0, 1, 2, 6, 8)
alt = rnorm(length(v1), 0, 1)
tb = tibble(v1, alt)

print(tb)

A tibble: 16 × 2
      v1     alt
   <dbl>   <dbl>
 1     1  0.495 
 2    NA  0.726 
 3     3  0.667 
 4     5  0.955 
 5    NA -1.68  
 6    NA -1.21  
 7     2 -1.96  
 8    12  1.47  
 9    NA  0.372 
10     5  1.07  
11    NA  0.531 
12     0  0.102 
13     1  1.34  
14     2  0.0872
15     6 -0.391 
16     8 -0.250 

I need to fill NAs in v1 using the mutate. The idea is that when there is one NA in v1, it will be filled by the multiplication between the variable alt and the value of v1 prior to the NA.

I solve this using loop for, but this may take time to depend on the data set.

for (i in 1:length(v1)) {
  if( is.na(tb[i, 'v1']) ){
    tb[i, 'v1'] = tb[i-1, 'v1']*tb[i, 'alt']
  }
}

This yields:

A tibble: 16 × 2
       v1     alt
    <dbl>   <dbl>
 1  1      0.495 
 2  0.726  0.726 
 3  3      0.667 
 4  5      0.955 
 5 -8.38  -1.68  
 6 10.1   -1.21  
 7  2     -1.96  
 8 12      1.47  
 9  4.47   0.372 
10  5      1.07  
11  2.65   0.531 
12  0      0.102 
13  1      1.34  
14  2      0.0872
15  6     -0.391 
16  8     -0.250 

My question is: How do I fill NAs using my condition and mutate or others dplyr verbs ?


Solution

  • You could make use of cumprod:

    tb %>%
      group_by(id = cumsum(!is.na(v1))) %>%
      mutate(v1 = v1[1]*cumprod(alt^is.na(v1)))
    
    # A tibble: 16 x 3
    # Groups:   id [11]
           v1     alt    id
        <dbl>   <dbl> <int>
     1  1      0.495      1
     2  0.726  0.726      1
     3  3      0.667      2
     4  5      0.955      3
     5 -8.4   -1.68       3
     6 10.2   -1.21       3
     7  2     -1.96       4
     8 12      1.47       5
     9  4.46   0.372      5
    10  5      1.07       6
    11  2.66   0.531      6
    12  0      0.102      7
    13  1      1.34       8
    14  2      0.0872     9
    15  6     -0.391     10
    16  8     -0.25      11
    

    In Base R:

    prd <- function(a, b){
      ifelse(is.na(a), prd(b * c(1,head(a,-1)),b), a)
    }
    
    
    transform(tb, v1 = prd(v1, alt))
           v1     alt
    1   1.000  0.4950
    2   0.726  0.7260
    3   3.000  0.6670
    4   5.000  0.9550
    5  -8.400 -1.6800
    6  10.164 -1.2100
    7   2.000 -1.9600
    8  12.000  1.4700
    9   4.464  0.3720
    10  5.000  1.0700
    11  2.655  0.5310
    12  0.000  0.1020
    13  1.000  1.3400
    14  2.000  0.0872
    15  6.000 -0.3910
    16  8.000 -0.2500