Search code examples
rdplyrnestedlubridate

Applying a function to a nested list


I have a data frame that contains nested list based on ID. I am trying to apply a function to the nested list within this data frame, but I am running into this error:

Error in make_track(tbl = x, .x = x, .y = y, .t = date, uid = ID, crs = sp::CRS("+init=epsg:32612")) : Non existent columns from tbl were requested.

Here is my reproducible example. I was wondering what the best way to apply a function to a nested list might be, and how I can go about fixing this error. Do I have to do a double lapply to fix this problem?

set.seed(12345)
library(lubridate)
library(dplyr)
library(amt)

f = function(data){
  data %>% mutate(
    new = floor_date(data$date, "10 days"),
    new = if_else(day(new) == 31, new - days(10), new)
  ) %>% 
    group_split(new)
}

nested <- tibble(
  ID = rep(c("A","B","C","D", "E"), 100),
  date = rep_len(seq(dmy("01-01-2010"), dmy("31-12-2013"), by = "days"), 500),
  x = runif(length(date), min = 60000, max = 80000),
  y = runif(length(date), min = 800000, max = 900000)
) %>% group_by(ID) %>% 
  nest() %>% 
  mutate(data = map(data, f))


track_list <- lapply(nested, function (x){
  make_track(tbl = x, .x = x, .y = y, .t = date,
             uid = ID,
             # lat/long: 4326 (lat/long, WGS84 datum).
             # utm: crs = sp::CRS("+init=epsg:32612"))
             crs = sp::CRS("+init=epsg:32612"))
})

Solution

  • The issue is that the data is nested, so we need to do one more level inside to pick up the data. Also, the make_track requires all columns to be in the same data object, so we need to create the corresponding uid from the 'ID' column of nested object

    library(purrr)
    library(dplyr)
    library(amt)
    out <- map2_dfr(nested$ID, nested$data, function(z, lst1)
        map_dfr(lst1, ~ {
               dat <- .x %>% 
                   mutate(ID = z)
          make_track(tbl = dat, .x = x, .y = y, .t = date, uid = ID, 
             crs = sp::CRS("+init=epsg:32612"))
          }))
    

    -output

    > out
    # A tibble: 500 x 4
           x_      y_ t_         uid  
        <dbl>   <dbl> <date>     <chr>
     1 74418. 820935. 2010-01-01 A    
     2 63327. 885896. 2010-01-06 A    
     3 60691. 873949. 2010-01-11 A    
     4 69250. 868411. 2010-01-16 A    
     5 69075. 876142. 2010-01-21 A    
     6 67797. 829892. 2010-01-26 A    
     7 75860. 843542. 2010-01-31 A    
     8 67233. 882318. 2010-02-05 A    
     9 75644. 826283. 2010-02-10 A    
    10 66424. 853789. 2010-02-15 A    
    # … with 490 more rows
    

    If we want the output as a nested list, use remove the _dfr

    out <- map2(nested$ID, nested$data, function(z, lst1)
        map(lst1, ~ {
               dat <- .x %>% 
                   mutate(ID = z)
          make_track(tbl = dat, .x = x, .y = y, .t = date, uid = ID, 
             crs = sp::CRS("+init=epsg:32612"))
          }))