Search code examples
rdplyrdbplyr

How can I expand grouped data in R using dplyr?


Is it possible to achieve the result of the following code in a more efficient and concise way? Due to the nature of my task, I cannot use base or tidyr functions, because the dplyr code needs to be translated to SQL and executed on a database.

library(dplyr)
library(dbplyr)
library(RSQLite)
library(DBI)

# Create example data set
id <- c("a", "b", "c")
df <- data.frame(id)

# Treat it as a data base table
con <- DBI::dbConnect(RSQLite::SQLite(), path = ":memory:")
copy_to(con, df, "data", temporary = FALSE)

# Expand to data set and create a variable for four quarters
n <- 4

data <- tbl(con, "data") %>%
    mutate(quarter = 1)

for (i in 2:n) {
    data <- data %>%
        mutate(quarter = i) %>%
        union(data, data) %>%
        show_query()
}

data <- collect(data)

My goal in my real life example is to query a list of IDs and to expand it to a data set with a variable "quarter". I want to use that list as a basis to successively join more information later.


Solution

  • It sounds like you want the Cartesian product of id = c('a', 'b', 'c') and quarters = c(1, 2, 3, 4), which would give you id_quarter = c(('a',1), ('a',2), ('a',3), ..., ('c',4)).

    This can be done using a join on a dummy variable like so:

    id <- c("a", "b", "c")
    df <- data.frame(id)
    quarter <- c(1, 2, 3, 4)
    df_q <- data.frame(quarter)
    
    # Treat it as a data base table
    con <- DBI::dbConnect(RSQLite::SQLite(), path = ":memory:")
    copy_to(con, df, "data", temporary = FALSE)
    copy_to(con, df_q, "quarter", temporary = FALSE)
    
    # create placeholder column
    data <- tbl(con, "data") %>%
        mutate(dummy_placeholder = 1)
    quarters <- tbl(con, "quarter") %>%
        mutate(dummy_placeholder = 1)
    
    # join and collect
    result <- data %>%
        inner_join(quarter, by = "dummy_placeholder") %>%
        collect()