Search code examples
sqlrsqlitefor-looprsqlite

Using a list of IDs to combine SQL tables into a data frame in R


I have a list of IDs that are not associated with any actual data. I have a SQL database that has a table for each of these IDs, and those tables have data that I would like to combine together into one large data frame based on the list of IDs that I have. I figured a for loop would be needed for this, but I haven't been able to get it to work properly.

For example I have a list of IDs" 1,2,3,4,5

I have a SQL database with tables for each of these, and they also have other data associated with the IDs. Each ID has multiple rows and columns.

I would like my end product to be the combination of those rows and columns for the list of IDs to be in a single data frame in r. How could I do this? What is the most efficient way to do so?

#Example data set
library(lubridate)
date <- rep_len(seq(dmy("26-12-2010"), dmy("20-12-2011"), by = "days"), 500)
ID <- rep(seq(1, 5), 100)

df <- data.frame(date = date,
                 x = runif(length(date), min = 60000, max = 80000),
                 y = runif(length(date), min = 800000, max = 900000),
                 ID)

for (i in 1: length(ID)){
  ID[i] <- dbReadTable(mydb, ID[i])
}

Thank you so much for your time.


Solution

  • I'll expand on my comment to finish the question.

    IDs <- lapply(setNames(nm=ID), function(i) dbReadTable(mydb, i)) 
    

    and then one of:

    ## base R
    IDs <- Map(function(x, nm) transform(x, id = nm), IDs, names(IDs))
    DF <- do.call(rbind, IDs)
    
    ## dplyr
    DF <- dplyr::bind_rows(IDs, .id = "id")
    
    ## data.table
    DF <- data.table::rbindlist(IDs, idcol = "id")
    

    The addition of the "id" column is to easily differentiate the rows based on the source ID. If the table already includes that, then you can omit the Map (base) and .id/idcol arguments.

    (This assumes, btw, that all tables have the same exact structure: same column names and same data types.)