Search code examples
rdplyrinterpolationacross

Approx function with group_by and across in R


I am currently interpolating a time-series and need to use the approx function in a dataframe with 4 columns and 172660 rows, but 4 groups (so its 43165 rows for each group). Currently, there's two answers about this: using summarise, but with the interpolation in just one column; and one using a datatable. The first approach indeed works, but not for my purpose. I also noted that using mutate_at, for example, is superseeded by mutate(across()). So I was trying to use a more up-to-date approach, but it's not working.

library(tidyverse)
tabela_1 <- tibble(x1 = rnorm(4800, mean = 88.5, sd = 4),
                   x2 = rnorm(4800, mean = -38.526, sd = 2.758),
                   x3 = rnorm(4800, mean = -22.6852, sd = 1.8652),
                   x4 = rnorm(4800, mean = -38.526, sd = 2.758),
                   tmpts = rep(x = seq(from = 0, to = 863.28, by = 0.72), 
                               times = 4),
                   category = rep(x = 1:4, each = 1200))
tabela <- tibble(tmpts = rep(x = seq(from = 0, to = 863.28, by = 0.02), 
                             times = 4),
                 category = rep(x = 1:4, each = 43165))
        
tabela_joined <- tabela %>% 
            left_join(tabela_1, by = c("tmpts", "category")) %>% 
            arrange(category, tmpts) %>% 
            janitor::clean_names()
        
tabela_interpolation <- tabela_joined %>% 
            group_by(category) %>%
            summarize(across(.cols = x1:x4, approx(., n = 43165)))

When running tabela_interpolation, I receive:

Erro: Problem with `summarise()` input `..1`.
i `..1 = across(.cols = x1:x15, approx(., n = 43165))`.
x Can't convert an integer vector to function
i The error occurred in group 1: run = 1.
Run `rlang::last_error()` to see where the error occurred.
Além disso: Warning message:
In regularize.values(x, y, ties, missing(ties), na.rm = na.rm) :
  collapsing to unique 'x' values

How should I use summarise plus across to get the interpolated time-series from approx function in each column in the dataframe?


Solution

  • You can use the across syntax as -

    library(tidyverse)
    
    tabela_joined %>% 
      group_by(category) %>%
      summarize(across(x1:x4, approx, n = 43165)) %>%
      ungroup
    

    Or

    tabela_joined %>% 
      group_by(category) %>%
      summarize(across(x1:x4, ~approx(., n = 43165))) %>%
      ungroup
    

    This can be followed by unnest to get the complete expanded dataframe.

    tabela_joined %>% 
      group_by(category) %>%
      summarize(across(x1:x4, approx, n = 43165)) %>%
      ungroup %>%
      unnest(x1:x4)
    
    #   category    x1    x2    x3    x4
    #      <int> <dbl> <dbl> <dbl> <dbl>
    # 1        1     1     1     1     1
    # 2        1     2     2     2     2
    # 3        1     3     3     3     3
    # 4        1     4     4     4     4
    # 5        1     5     5     5     5
    # 6        1     6     6     6     6
    # 7        1     7     7     7     7
    # 8        1     8     8     8     8
    # 9        1     9     9     9     9
    #10        1    10    10    10    10
    # … with 345,310 more rows