Search code examples
rinner-joinknitrr-markdowncommon-table-expression

Does SQL chunk in Rmarkdown/knitr on Redshift support INNER JOIN and Common Table Expressions?


Really enjoyed using Rmarkdown/Knitr execution of SQL chunks recently, however it seems there is a limitation for more complex queries. I've been using a Redshift data base with an RJDBC powered connection.

Specifically using Common Table expressions with INNER JOIN:

```{sql, connection=redshift, output.var="Field_count"}
WITH
    cte AS (
      SELECT DISTINCT field
      FROM
        table
      WHERE date >= '2017-01-01'
  )SELECT count(DISTINCT field)
FROM cte
INNER JOIN table_2 t2 ON t2.join_here = cte.join_here;
```

I've successfully used CTE and INNER JOIN in individual queries, but combining them leads to an error:

Error in .verify.JDBC.result(r, "Unable to retrieve JDBC results set for ", : Unable to retrieve JDBC result set for WITH
        cte AS (
          SELECT DISTINCT field
          FROM
            table
          WHERE date >= '2017-01-01'
      )SELECT count(DISTINCT field)
    FROM cte
    INNER JOIN table_2 t2 ON t2.join_here = cte.join_here; ([Amazon] (500335) One query is expected: WITH
        cte AS (
          SELECT DISTINCT field
          FROM
            table
          WHERE date >= '2017-01-01'
      )SELECT count(DISTINCT field)
    FROM cte
    INNER JOIN table_2 t2 ON t2.join_here = cte.join_here;) Calls: <Anonymous> ... dbSendQuery -> dbSendQuery -> .local -> .verify.JDBC.reslt Execution Halted

It appears that this is being translated wrongly/duplicated on the querying out of the RStudio session.

Is this a known limitation? If so is it documented anywhere?

Is this a known bug? If so is it in an existing issue?

EDIT: This is only visibly happening within the knit process, individually running chunks interactively returns expected results.


Solution

  • I'd suggest trying another version on Redshift JDBC driver.

    I got the same error, however, it was while using Pentaho so mileage may vary. I didn't find any documentation or other info related to this. My problem got solved by switching away from the newest driver version. I was using RedshiftJDBC4-1.2.1.1001.jar (JDBC 4.0 compatible) and switched to RedshiftJDBC4-1.1.10.1010.jar.