Search code examples
rdatedplyrpadr

How to use fill_by_function() with na.approx() [linear interpolation] inside dplyr


I'm going through the documentation for padr:

https://cran.r-project.org/web/packages/padr/vignettes/padr.html.

Changing the vignette example slightly to make use of linear interpolation (zoo::na.approx()) on the data is generating an error:

library(tidyverse)
library(padr)
library(zoo)

set.seed(123)

emergency %>% 
  filter(title == 'EMS: DEHYDRATION') %>% 
  thicken(interval = 'day') %>% 
  group_by(time_stamp_day) %>% 
  summarise(nr = n() + as.integer(runif(1, 1, 999)) ) %>% 
  pad()

results in:

# A tibble: 307 × 2
   time_stamp_day    nr
           <date> <int>
1      2015-12-12    79
2      2015-12-13    42
3      2015-12-14    NA
4      2015-12-15    NA
5      2015-12-16    NA
6      2015-12-17    NA
7      2015-12-18    88
8      2015-12-19    NA
9      2015-12-20    NA
10     2015-12-21    NA
# ... with 297 more rows

now I want to interpolate 42 to 88 linearly. I thought the best way to accomplish this would be using zoo::na.approx() inside padr::fill_by_function() :

emergency %>% 
 filter(title == 'EMS: DEHYDRATION') %>% 
 thicken(interval = 'day') %>% 
 group_by(time_stamp_day) %>% 
 summarise(nr = n() + as.integer(runif(1, 1, 99)) ) %>% 
 pad() %>% 
 fill_by_function(nr, na.approx)

But I am getting the following error:

Error in inds[i] <- which(colnames_x == as.character(cols[[i]])) : 
  replacement has length zero

Any ideas on how to start fixing this?


Solution

  • You only need mutate to do na.approx:

    library(tibble);library(zoo)
    emergency <- as_tibble(read.table(text="time_stamp_day    nr
    1      2015-12-12    79
    2      2015-12-13    42
    3      2015-12-14    NA
    4      2015-12-15    NA
    5      2015-12-16    NA
    6      2015-12-17    NA
    7      2015-12-18    88
    8      2015-12-19    NA
    9      2015-12-20    NA
    10     2015-12-21    NA",header=TRUE,stringsAsFactors=FALSE))
    
    emergency %>% mutate(nr=na.approx(nr,na.rm =FALSE))
    
    # A tibble: 10 × 2
       time_stamp_day    nr
                <chr> <dbl>
    1      2015-12-12  79.0
    2      2015-12-13  42.0
    3      2015-12-14  51.2
    4      2015-12-15  60.4
    5      2015-12-16  69.6
    6      2015-12-17  78.8
    7      2015-12-18  88.0
    8      2015-12-19    NA
    9      2015-12-20    NA
    10     2015-12-21    NA