Search code examples
rmatrixtidyverselubridate

Calculating differences in nested data frames and dividing across another list of matrices


I have split a data frame by two months (june and july). I then nested these two different data frames based on the ID. The nested data frame contains a column with the ID and a data column.

The data column contains a list, and the list represents the data for the ID that has been split into three 10-day intervals in a month. For example, for the ID A, the list shows [[1]] as the first 10-days, [[2]] as the second 10-days, and [[3]] as the third 10-days within a month.

For the next component, I want to go down each list for each ID and calculate the difference between the minimum jDate in nested_june and nested_july, as shown below for n1, n2,and n3. These differences are then combined into a matrix, m1.

Finally, I have a list of two matrices l1, and I would like to divide each matrices in the list by m1.

Is there a more efficient way calculate the differences and the division of matrices in list?

library(lubridate)
library(dplyr)
library(tidyr)
library(purrr)

f = function(data){
  data %>% mutate(
    new = floor_date(data$date, "10 days"),
    new = if_else(day(new) == 31, new - days(10), new)
  ) %>% 
    group_split(new)
}

ID <-  rep(c("A","B","C", "D"), 1000)
date <-  rep_len(seq(dmy("01-01-2010"), dmy("31-12-2013"), by = "days"), 500)
x <-  runif(length(date), min = 60000, max = 80000)
y <-  runif(length(date), min = 800000, max = 900000)

df <- data.frame(date = date, 
                 x = x,
                 y =y,
                 ID)

df$jDate <- julian(as.Date(df$date), origin = as.Date("1970-01-01"))
df$Month <- month(df$date)

df_june <- filter(df, Month == c("6"))
df_july <- filter(df, Month == c("7"))

nested_june <- tibble(
  df_june
) %>% group_by(ID) %>%
  nest() %>% 
  mutate(data = map(data, f))

nested_july <- tibble(
  df_july
) %>% group_by(ID) %>%
  nest() %>% 
  mutate(data = map(data, f))

# Create list of matrices
t1 <- c(100,150,200)
t2 <- c(200,250,350)
t3 <- c(300,350, 400)
mat <- cbind(t1,t2, t3)

t1 <- c(150,150,200)
t2 <- c(250,250,350)
t3 <- c(350,350, 400)
mat2 <- cbind(t1,t2, t3)

l1 <- list(list(mat), list(mat2))

## Hoping to get a function for everything below here ##

# Calculate difference in days from the first day of one interval to the first 
# day of the second interval and repeat with the other intervals. 

n1 <- c(((min(nested_july[[2]][[1]][[1]]$jDate))- min(nested_june[[2]][[1]][[1]]$jDate)),
        ((min(nested_july[[2]][[1]][[1]]$jDate))- min(nested_june[[2]][[1]][[2]]$jDate)),
        ((min(nested_july[[2]][[1]][[1]]$jDate))- min(nested_june[[2]][[1]][[3]]$jDate)))

n2 <- c(((min(nested_july[[2]][[1]][[2]]$jDate))- min(nested_june[[2]][[1]][[1]]$jDate)),
        ((min(nested_july[[2]][[1]][[2]]$jDate))- min(nested_june[[2]][[1]][[2]]$jDate)),
        ((min(nested_july[[2]][[1]][[2]]$jDate))- min(nested_june[[2]][[1]][[3]]$jDate)))

n3 <-  c(((min(nested_july[[2]][[1]][[3]]$jDate))- min(nested_june[[2]][[1]][[1]]$jDate)),
         ((min(nested_july[[2]][[1]][[3]]$jDate))- min(nested_june[[2]][[1]][[2]]$jDate)),
         ((min(nested_july[[2]][[1]][[3]]$jDate))- min(nested_june[[2]][[1]][[3]]$jDate)))
m1 <- cbind(n1,n2,n3)


# Expected output as matrices
l1[[1]][[1]]/m1

l1[[2]][[1]]/m1



Solution

  • Try with lapply

    lapply(l1, function(sub) {sub <- lapply(sub, `/`, m1)
        sub})
    

    -output

    [[1]]
    [[1]][[1]]
                t1        t2        t3
    [1,]  3.333333  4.761905  6.000000
    [2,]  8.333333  8.333333  9.210526
    [3,] 20.000000 15.909091 13.333333
    
    
    [[2]]
    [[2]][[1]]
                t1        t2        t3
    [1,]  5.000000  5.952381  7.000000
    [2,]  8.333333  8.333333  9.210526
    [3,] 20.000000 15.909091 13.333333
    

    TO create the matrix, we can do

    library(tidyr)
    library(purrr)
    library(dplyr)
    m2 <-  crossing(i1 = seq_len(ncol(l1[[1]][[1]])),
           i2 = seq_len(ncol(l1[[1]][[1]]))) %>% 
       transmute(new =map2_dbl(i1, i2, 
         ~ min(nested_july[[2]][[1]][[.x]]$jDate) - 
            min(nested_june[[2]][[1]][[.y]]$jDate))) %>% 
       pull(new) %>%
       matrix(ncol = 3)
    

    -checking

    
    > m2
          [,1] [,2] [,3]
    [1,]   30   42   50
    [2,]   18   30   38
    [3,]   10   22   30
    

    OP's 'm1

    > m1
         n1 n2 n3
    [1,] 30 42 50
    [2,] 18 30 38
    [3,] 10 22 30