I have a set of pair values stored in a dataframe parameters
:
parameters <- data.frame(
variant_id = c(1, 2, 3, 4, 5),
start_date = c("2019-07-01", "2019-09-05", "2019-05-21", "2019-09-06",
"2019-04-19"))
> parameters
variant_id start_date
1 1 2019-07-01
2 2 2019-09-05
3 3 2019-05-21
4 4 2019-09-06
5 5 2019-04-19
I want to use this combinations of variant_id
and start_date
as dynamic parameters in this SQL query performed in RPostgres.
library(RPostgres)
library(tidyverse)
query <- "select sum(o.quantity)
from orders o
where o.date >= << start_date >>
and o.variant_id = << variant_id >> "
df <- dbGetQuery(db, query)
I will have then queries like:
query_1 <- "select sum(o.quantity)
from orders o
where o.date >= '2019-07-01'
and o.variant_id = 1 "
result_1 <- dbGetQuery(db, query_1)
> result_1
sum
1 100
query_2 <- "select sum(o.quantity)
from orders o
where o.date >= '2019-09-05'
and o.variant_id = 2 "
result_2 <- dbGetQuery(db, query_2)
> result_2
sum
1 120
query_3 <- "select sum(o.quantity)
from orders o
where o.date >= '2019-05-21'
and o.variant_id = 3 "
result_3 <- dbGetQuery(db, query_3)
> result_3
sum
1 140
... and so on.
Then, I would like to append each result in a new dataframe results
as:
results <- data.frame(
variant_id = c(1, 2, 3, 4, 5),
quantity = c(100, 120, 140, 150, 160)
)
> results
variant_id quantity
1 1 100
2 2 120
3 3 140
4 4 150
5 5 160
How can I approach this problem using RPostgres
and dplyr
avoiding the use of a loop?
We don't have your database but have used parameters
and orders
given in the Note at the end. We have added stringsAsFactors = FALSE
to the parameters
definition to ensure that we have character strings.
Now, use sprintf
to create a character vector of queries. Then run each one. Here we use sqldf
to make everything reproducible since we don't have your data base but you can replace sqldf
with the appropriate statement to get the result from your database.
query <- "select sum(o.quantity)
from orders o
where o.date >= '%s'
and o.variant_id = %s "
queries <- with(parameters, sprintf(query, start_date, variant_id))
library(sqldf)
# replace sqldf in next line with appropriate function to invoke query
do.call("rbind", lapply(queries, sqldf))
## sum(o.quantity)
## 1 1
## 2 NA
## 3 3
## 4 NA
## 5 NA
# test data
parameters <- data.frame(
variant_id = c(1, 2, 3, 4, 5),
start_date = c("2019-07-01", "2019-09-05", "2019-05-21", "2019-09-06",
"2019-04-19"), stringsAsFactors = FALSE)
orders <- data.frame(date = "2019-07-02", variant_id = 1:3, quantity = 1:3)