According to the answer of lhs, https://stackoverflow.com/a/72467827/11124121
#From lhs
library(tidyverse)
data("population")
# create some data to interpolate
population_5 <- population %>%
filter(year %% 5 == 0) %>%
mutate(female_pop = population / 2,
male_pop = population / 2)
interpolate_func <- function(variable, data) {
data %>%
group_by(country) %>%
# can't interpolate if only one year
filter(n() >= 2) %>%
group_modify(~as_tibble(approx(.x$year, .x[[variable]],
xout = min(.x$year):max(.x$year)))) %>%
set_names(c("country", "year", paste0(variable, "_interpolated"))) %>%
ungroup()
}
The data that already exists, i.e. year 2000 and 2005 are also interpolated. I want to keep the orginal data and only interpolate the missing parts, that is,
2001-2004 ; 2006-2009
Therefore, I would like to construct a list:
population_5_list = list(population_5 %>% filter(year %in% c(2000,2005)),population_5 %>% filter(year %in% c(2005,2010)))
And impute the dataframes in the list one by one.
However, a error appeared:
Error in UseMethod("group_by") :
no applicable method for 'group_by' applied to an object of class "list"
I am wondering how can I change the interpolate_func
into purrr
format, in order to apply to list.
We need to loop over the list
with map
library(purrr)
library(dplyr)
map(population_5_list,
~ map(vars_to_interpolate, interpolate_func, data = .x) %>%
reduce(full_join, by = c("country", "year")))
-output
[[1]]
# A tibble: 1,266 × 5
country year population_interpolated female_pop_interpolated male_pop_interpolated
<chr> <int> <dbl> <dbl> <dbl>
1 Afghanistan 2000 20595360 10297680 10297680
2 Afghanistan 2001 21448459 10724230. 10724230.
3 Afghanistan 2002 22301558 11150779 11150779
4 Afghanistan 2003 23154657 11577328. 11577328.
5 Afghanistan 2004 24007756 12003878 12003878
6 Afghanistan 2005 24860855 12430428. 12430428.
7 Albania 2000 3304948 1652474 1652474
8 Albania 2001 3283184. 1641592. 1641592.
9 Albania 2002 3261421. 1630710. 1630710.
10 Albania 2003 3239657. 1619829. 1619829.
# … with 1,256 more rows
# ℹ Use `print(n = ...)` to see more rows
[[2]]
# A tibble: 1,278 × 5
country year population_interpolated female_pop_interpolated male_pop_interpolated
<chr> <int> <dbl> <dbl> <dbl>
1 Afghanistan 2005 24860855 12430428. 12430428.
2 Afghanistan 2006 25568246. 12784123. 12784123.
3 Afghanistan 2007 26275638. 13137819. 13137819.
4 Afghanistan 2008 26983029. 13491515. 13491515.
5 Afghanistan 2009 27690421. 13845210. 13845210.
6 Afghanistan 2010 28397812 14198906 14198906
7 Albania 2005 3196130 1598065 1598065
8 Albania 2006 3186933. 1593466. 1593466.
9 Albania 2007 3177735. 1588868. 1588868.
10 Albania 2008 3168538. 1584269. 1584269.
# … with 1,268 more rows