Search code examples
rlisttidyverseinterpolationpurrr

Is it possible to interpolate a list of dataframes in r?


According to the answer of lhs, https://stackoverflow.com/a/72467827/11124121

#From lhs
library(tidyverse)
data("population")

# create some data to interpolate
population_5 <- population %>% 
  filter(year %% 5 == 0) %>% 
  mutate(female_pop = population / 2,
         male_pop = population / 2)

interpolate_func <- function(variable, data) {
  data %>% 
    group_by(country) %>% 
    # can't interpolate if only one year
    filter(n() >= 2) %>% 
    group_modify(~as_tibble(approx(.x$year, .x[[variable]], 
                                   xout = min(.x$year):max(.x$year)))) %>% 
    set_names(c("country", "year", paste0(variable, "_interpolated"))) %>% 
    ungroup()
}

The data that already exists, i.e. year 2000 and 2005 are also interpolated. I want to keep the orginal data and only interpolate the missing parts, that is,

2001-2004 ; 2006-2009

Therefore, I would like to construct a list:

population_5_list = list(population_5 %>% filter(year %in% c(2000,2005)),population_5 %>% filter(year %in% c(2005,2010)))

And impute the dataframes in the list one by one.

However, a error appeared:

Error in UseMethod("group_by") :
no applicable method for 'group_by' applied to an object of class "list"

I am wondering how can I change the interpolate_func into purrr format, in order to apply to list.


Solution

  • We need to loop over the list with map

    library(purrr)
    library(dplyr)
    map(population_5_list,  
       ~ map(vars_to_interpolate, interpolate_func, data = .x) %>% 
            reduce(full_join, by = c("country", "year")))
    

    -output

    [[1]]
    # A tibble: 1,266 × 5
       country      year population_interpolated female_pop_interpolated male_pop_interpolated
       <chr>       <int>                   <dbl>                   <dbl>                 <dbl>
     1 Afghanistan  2000               20595360                10297680              10297680 
     2 Afghanistan  2001               21448459                10724230.             10724230.
     3 Afghanistan  2002               22301558                11150779              11150779 
     4 Afghanistan  2003               23154657                11577328.             11577328.
     5 Afghanistan  2004               24007756                12003878              12003878 
     6 Afghanistan  2005               24860855                12430428.             12430428.
     7 Albania      2000                3304948                 1652474               1652474 
     8 Albania      2001                3283184.                1641592.              1641592.
     9 Albania      2002                3261421.                1630710.              1630710.
    10 Albania      2003                3239657.                1619829.              1619829.
    # … with 1,256 more rows
    # ℹ Use `print(n = ...)` to see more rows
    
    [[2]]
    # A tibble: 1,278 × 5
       country      year population_interpolated female_pop_interpolated male_pop_interpolated
       <chr>       <int>                   <dbl>                   <dbl>                 <dbl>
     1 Afghanistan  2005               24860855                12430428.             12430428.
     2 Afghanistan  2006               25568246.               12784123.             12784123.
     3 Afghanistan  2007               26275638.               13137819.             13137819.
     4 Afghanistan  2008               26983029.               13491515.             13491515.
     5 Afghanistan  2009               27690421.               13845210.             13845210.
     6 Afghanistan  2010               28397812                14198906              14198906 
     7 Albania      2005                3196130                 1598065               1598065 
     8 Albania      2006                3186933.                1593466.              1593466.
     9 Albania      2007                3177735.                1588868.              1588868.
    10 Albania      2008                3168538.                1584269.              1584269.
    # … with 1,268 more rows