Search code examples
rtidyversepurrrsmoothing

smoothing a timeseries with multiple y per x


I have a host of timeseries, all part of a large dataframe with many grouping variables, that I need to smooth. I am getting comfortable with purrr, so a group_by() %>% nest() approach seems reasonable. Each nested dataframe will look something like this:

data <- structure(list(time = c(0, 0, 6, 6, 12, 12, 18, 18, 24, 24, 30, 
    30, 36, 36, 42, 42, 48, 48, 54, 54, 60, 60, 66, 66, 72, 72, 78, 
    78, 84, 84, 90, 90, 96, 96, 102, 102, 108, 108, 114, 114, 120, 
    120, 126, 126, 132, 132, 138, 138), confluence = c(14.68764, 
    19.73559, 2.897458, 3.478664, 3.46789, 4.122939, 4.270285, 4.534702, 
    4.838222, 5.578382, 5.938678, 6.337464, 7.116287, 7.824044, 8.50258, 
    10.16758, 11.13803, 13.25756, 18.46681, 11.97336, 24.45211, 14.61754, 
    30.7178, 19.91414, 37.93423, 26.0687, 45.91022, 33.69255, 57.83714, 
    42.13477, 69.2417, 54.8134, 79.81015, 68.28696, 89.50358, 78.21476, 
    95.31271, 87.13279, 97.71458, 94.69752, 98.59245, 97.71144, 98.8707, 
    98.87447, 98.99731, 99.42957, 99.02805, 99.6716)), row.names = c(NA, 
    -48L), class = c("tbl_df", "tbl", "data.frame"))

library(tidyverse)

ggplot(data = x) +
    geom_point(aes(x = time, y = confluence)) +   
    geom_smooth(aes(x = time, y = confluence))

enter image description here

My desired output for a smoothing function is to have another column for each x (timepoint) with the smoothened value. Since there are two y-values (confluence) per x, there should be two duplicate and identical smoothened values.

The problem is that I can't find a smoothing function that gives this desired output so I can easily append a smoothened column via mutate e.g. data <- data %>% mutate(smooth_y = FUN(time, confluence)). I looked at some smoothing functions like loess(data$time ~ data$confluence) which puts out an object (I guess a fitted line with a bunch of parameters, I guess) or supsmu(data$time, data$confluence) which drops duplicate x values for the output.

Is there a smoothing function that will create an output for all x? Or is there a simply way on how to incorporate the appropriate merger in mutate of vectors with different lengths? The problem is that the number of x/y pairs in the different split groups may not be identical (some missing values, maybe some duplicates), so it would have to be a robust mapping back (and not rely on simple duplication of the y-values).

Desired output:

# head(data)
#
# # A tibble: 6 x 3
# time confluence smooth
# <dbl>      <dbl>  <dbl>
#   1     0      14.7   14.7 
# 2     0      19.7   14.7 
# 3     6       2.90   8.72
# 4     6       3.48   8.72
# 5    12       3.47   5.10
# 6    12       4.12   5.10

enter image description here


Solution

  • I just realized I was just being dense. I think it's pretty trivial to just set up an additional column with the output from the smoothing formula and then to a full_join on the x-axis values.

    data <- structure(list(time = c(0, 0, 6, 6, 12, 12, 18, 18, 24, 24, 30, 
        30, 36, 36, 42, 42, 48, 48, 54, 54, 60, 60, 66, 66, 72, 72, 78, 
        78, 84, 84, 90, 90, 96, 96, 102, 102, 108, 108, 114, 114, 120, 
        120, 126, 126, 132, 132, 138, 138), confluence = c(14.68764, 
        19.73559, 2.897458, 3.478664, 3.46789, 4.122939, 4.270285, 4.534702, 
        4.838222, 5.578382, 5.938678, 6.337464, 7.116287, 7.824044, 8.50258, 
        10.16758, 11.13803, 13.25756, 18.46681, 11.97336, 24.45211, 14.61754, 
        30.7178, 19.91414, 37.93423, 26.0687, 45.91022, 33.69255, 57.83714, 
        42.13477, 69.2417, 54.8134, 79.81015, 68.28696, 89.50358, 78.21476, 
        95.31271, 87.13279, 97.71458, 94.69752, 98.59245, 97.71144, 98.8707, 
        98.87447, 98.99731, 99.42957, 99.02805, 99.6716)), row.names = c(NA, 
        -48L), class = c("tbl_df", "tbl", "data.frame"))
    
    library(tidyverse   )
    
    smooth <- data.frame(supsmu(data$time, data$confluence))
    data <- full_join(data, smooth, by= c("time" = "x"))
    
    ggplot(data = data) +
        geom_point(aes(x = time, y = confluence)) + 
        geom_smooth(aes(x = time, y = confluence)) +
        geom_point(aes(x = time, y = y), color = "red") 
    
    head(data, 10)
    
    # # A tibble: 10 x 3
    # time confluence     y
    # <dbl>      <dbl> <dbl>
    #   1     0      14.7  14.7 
    # 2     0      19.7  14.7 
    # 3     6       2.90  8.72
    # 4     6       3.48  8.72
    # 5    12       3.47  5.10
    # 6    12       4.12  5.10
    # 7    18       4.27  4.49
    # 8    18       4.53  4.49
    # 9    24       4.84  5.30
    # 10    24       5.58  5.30
    

    enter image description here