Search code examples
rtidyverserpostgresql

Parametrize query in RPostgres and append results in a new dataframe


I have a set of pair values stored in a dataframe parameters:

parameters <- data.frame(
   variant_id = c(1, 2, 3, 4, 5),
   start_date = c("2019-07-01", "2019-09-05", "2019-05-21", "2019-09-06",
                  "2019-04-19"))

> parameters
  variant_id start_date
1          1 2019-07-01
2          2 2019-09-05
3          3 2019-05-21
4          4 2019-09-06
5          5 2019-04-19

I want to use this combinations of variant_id and start_date as dynamic parameters in this SQL query performed in RPostgres.

library(RPostgres)
library(tidyverse)

query <- "select sum(o.quantity)
from orders o
where o.date >= << start_date >>
and o.variant_id = << variant_id >> "

df <- dbGetQuery(db, query)

I will have then queries like:

query_1 <- "select sum(o.quantity)
from orders o
where o.date >= '2019-07-01'
and o.variant_id = 1 "

result_1 <- dbGetQuery(db, query_1)
 > result_1
     sum
   1 100

query_2 <- "select sum(o.quantity)
from orders o
where o.date >= '2019-09-05'
and o.variant_id = 2 "

result_2 <- dbGetQuery(db, query_2)
 > result_2
     sum
   1 120


query_3 <- "select sum(o.quantity)
from orders o
where o.date >= '2019-05-21'
and o.variant_id = 3 "

result_3 <- dbGetQuery(db, query_3)
 > result_3
     sum
   1 140

... and so on.

Then, I would like to append each result in a new dataframe results as:

results <- data.frame(
              variant_id = c(1, 2, 3, 4, 5),
                quantity = c(100, 120, 140, 150, 160)
           )

> results
  variant_id quantity
1          1      100
2          2      120
3          3      140
4          4      150
5          5      160

How can I approach this problem using RPostgres and dplyr avoiding the use of a loop?


Solution

  • We don't have your database but have used parameters and orders given in the Note at the end. We have added stringsAsFactors = FALSE to the parameters definition to ensure that we have character strings.

    Now, use sprintf to create a character vector of queries. Then run each one. Here we use sqldf to make everything reproducible since we don't have your data base but you can replace sqldf with the appropriate statement to get the result from your database.

    query <- "select sum(o.quantity)
      from orders o
      where o.date >= '%s'
      and o.variant_id = %s "
    
    queries <- with(parameters, sprintf(query, start_date, variant_id))
    
    library(sqldf)
    
    # replace sqldf in next line with appropriate function to invoke query
    do.call("rbind", lapply(queries, sqldf))
    ##   sum(o.quantity)
    ## 1               1
    ## 2              NA
    ## 3               3
    ## 4              NA
    ## 5              NA
    

    Note

    # test data
    
    parameters <- data.frame(
       variant_id = c(1, 2, 3, 4, 5),
       start_date = c("2019-07-01", "2019-09-05", "2019-05-21", "2019-09-06",
                      "2019-04-19"), stringsAsFactors = FALSE)
    
    orders <- data.frame(date = "2019-07-02", variant_id = 1:3, quantity = 1:3)