Search code examples
sqloracle-databaseansi-sql

Insert result of a parametrized query into a CTE


I am querying a table containing billions of rows (BIG_TABLE) in Oracle. The table is partitionned on a monthly basis (DTE is the month of flux). The DBA wants us to filter our queries by month of flux. The DBA warned that (i) not doing so may cause slower queries and (ii) it may impact the others users so he may kill the non-filtered queries.

Since I need several months (sometimes up to 100), I use a parametrized query to pass the months I need (more below). I collect the results for all months in an intermediate table (INTERMEDIATE_TABLE, containing about 200 million rows per month) and then I aggregate the data (FINAL_TABLE, used for the analyses). The sum must be done by CHR whatever the month of flux.

-- query1
CREATE TABLE
  INTERMEDIATE_TABLE (
    CHR VARCHAR2(255),
    NBR NUMBER,
    DTE DATE
);

-- query2
INSERT INTO
  INTERMEDIATE_TABLE
SELECT
  CHR,
  NBR,
  DTE
FROM
  BIG_TABLE
WHERE
  DTE = TO_DATE(?, 'YYYY-MM-DD');

-- query3
CREATE TABLE
  FINAL_TABLE AS
SELECT
  CHR,
  SUM(NBR) AS NBR
FROM
  INTERMEDIATE_TABLE
GROUP BY
  CHR;

After saving the queries as strings (query1, query2, query3), I use R's DBI to run the queries. dbExecute() passes the dates one by one to the equal comparison in query2, so the query is executed 3 times (i.e. the list is not passed in one go).

library(DBI)
dbConnect(odbc::odbc(), ...)
dbExecute(con, query1)
dbExecute(con, query2, params = list(c("2020-01-01", "2020-02-01", "2020-03-01")))
dbExecute(con, query3)

I would like INTERMEDIATE_TABLE to be temporary. Can I INSERT INTO a CTE?

I am open to alternative solutions but I am looking for an ANSI solution because a move to PostgreSQL is planned. I would like to avoid temporary table because only the data is temporary in Oracle, not the table. Of course I could just DROP the table afterwards, but it seems more elegant/explicit (and efficient?) to make it temporary in the first place.


A reproducible example:

CREATE TABLE
  BIG_TABLE (
    CHR VARCHAR2(255),
    NBR NUMBER,
    DTE DATE
);

INSERT ALL
  INTO BIG_TABLE VALUES ('A', 2, DATE '2020-01-01')
  INTO BIG_TABLE VALUES ('B', 3, DATE '2020-01-01')
  INTO BIG_TABLE VALUES ('A', 1, DATE '2020-02-01')
  INTO BIG_TABLE VALUES ('B', 2, DATE '2020-02-01')
  INTO BIG_TABLE VALUES ('A', 3, DATE '2020-02-01')
  INTO BIG_TABLE VALUES ('B', 2, DATE '2020-03-01')
  INTO BIG_TABLE VALUES ('B', 4, DATE '2020-03-01')
  INTO BIG_TABLE VALUES ('C', 1, DATE '2020-03-01')
  INTO BIG_TABLE VALUES ('B', 4, DATE '2020-04-01')
  INTO BIG_TABLE VALUES ('D', 2, DATE '2020-05-01')
SELECT 1 FROM DUAL;

Desired output:

CHR NBR
  A   6
  B  11
  C   1

Solution

  • You don't need any kind of intermediate stage at all, not even a subquery.

    CREATE TABLE FINAL_TABLE AS
    SELECT CHR,
           SUM(NBR) NBR
      FROM BIG_TABLE
     WHERE DTE = ?
     GROUP BY CHR;
    

    Of course, with a billion rows, you want parallelism, which you may or may not be getting depending on how the table is configured. It might help to explicitly request it:

    CREATE TABLE FINAL_TABLE PARALLEL (DEGREE 16) NOLOGGING  AS
    SELECT CHR,
           SUM(NBR) NBR
      FROM BIG_TABLE
     WHERE DTE = ?
     GROUP BY CHR;
    

    If you need multiple months, you have several options. I suggest a single query in a manner that maximizes parallel thread distribution by partition, if this is a partitioned table:

    CREATE TABLE final_table PARALLEL (DEGREE 16) NOLOGGING AS
    SELECT chr,
           SUM(nbr) nbr
      FROM (SELECT chr,
                   SUM(nbr) nbr
              FROM big_table
             WHERE dte = ?
             GROUP BY chr
            UNION ALL
            SELECT chr,
                   SUM(nbr) nbr
              FROM big_table
             WHERE dte = ?
             GROUP BY chr
            UNION ALL
            SELECT chr,
                   SUM(nbr) nbr
              FROM big_table
             WHERE dte = ?
             GROUP BY chr)
      GROUP BY chr
    

    If you need to pull many months and have them all end up in this table while only pulling one month at a time, then precreate the table and insert each month:

    CREATE TABLE final_table(dte date,chr varchar2(18),nbr integer); -- one time
    
    INSERT /*+ append */ into final_table
    SELECT /*+ parallel(16) */
           dte,
           chr,
           SUM(nbr) nbr
      FROM big_table
     WHERE dte = ?
     GROUP BY dte, 
              chr;
    
    COMMIT;
    
    -- repeat for every month. 
    

    Then at the end if you need the whole aggregated across time, you can do a group by to exclude the date:

    SELECT chr,
           SUM(nbr) nbr
      FROM final_table
     GROUP BY chr