Search code examples
sqlrrnotebook

SQL chunks in R - Multiple statements and temp tables


I'm trying to run a SQL script inside an R notebook chunk. However, the final SELECT returns no results if it is preceded by multiple SQL statements (that don't generate result sets) within the same chunk.

  1. I have a functioning connection. The database server runs SQL Server.
  2. My chunks that only contain one statement return output in a dataframe.
  3. However, I also need to do multi-pass SQL in a script. I'd like to create temp tables along the way and use them to drive some aggregations. I want to save the results into a temp table and conclude with a single SELECT statement from the temp table, producing the output from the chunk.
  4. I tried saving the script to a file and using the code attribute for the chunk. That just got me an error about an incomplete final line. I don't require a separate file. I was just trying another way to hack this, but I couldn't make that work either.

It hasn't worked yet, and I have been unable to trace the problem. Maybe SQL chunks don't even support what I want to make them do. Any ideas?

  • In an R notebook SQL chunk, is it possible to run multiple SQL statements?
  • Is it possible to create a temp table and reference it in a subsequent statement within the same chunk?
  • Is it possible to execute a loop construct within the SQL chunk?

Here are some toy examples.

This chunk works, returning a dataframe.

```{sql test, connection=con, output.var = "test_df"}
select  TABLE_NAME
from    INFORMATION_SCHEMA.TABLES
where   TABLE_SCHEMA = 'dbo';
```

This toy chunk returns nothing.

```{sql test2, connection=con, output.var = "test2_df"}
drop table if exists #tables;
select  TABLE_NAME
into    #tables
from    INFORMATION_SCHEMA.TABLES
where   TABLE_SCHEMA = 'dbo';

drop table if exists #columns;
select  TABLE_NAME,
        COLUMN_NAME
into    #columns
from    INFORMATION_SCHEMA.COLUMNS
where   TABLE_SCHEMA = 'dbo'
and     TABLE_NAME in (
        select  top 3
                TABLE_NAME
        from    #tables)
order by 1, 2;

select  *
from    #columns;
```

Solution

  • One possible solution is to add set nocount on at the top of your chunk, like this:

    ```{sql test2, connection=con, output.var = "test2_df"}
    set nocount on
    
    drop table if exists #tables;
    select  TABLE_NAME
    into    #tables
    from    INFORMATION_SCHEMA.TABLES
    where   TABLE_SCHEMA = 'dbo';
    
    drop table if exists #columns;
    select  TABLE_NAME,
            COLUMN_NAME
    into    #columns
    from    INFORMATION_SCHEMA.COLUMNS
    where   TABLE_SCHEMA = 'dbo'
    and     TABLE_NAME in (
            select  top 3
                    TABLE_NAME
            from    #tables)
    order by 1, 2;
    
    select  *
    from    #columns;
    

    Another option is to use CTE, like this

    ```{sql test2, connection=con, output.var = "test2.df"}
    WITH TABLES AS (
    SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA = 'dbo'
    )
    SELECT TABLE_NAME, COLUMN_NAME
    FROM INFORMATION_SCHEMA.COLUMNS
    WHERE TABLE_SCHEMA = 'dbo' AND TABLE_NAME IN (
      SELECT TOP 3 TABLE_NAME FROM TABLES
    )
    ORDER BY 1,2;