Search code examples
sqloracleoracle12cdatabase-performance

Most efficient way to compute difference between dates


I have a big table, partitioned by date, which is structured as follows:

InTable:

Key Date Field1 Value
key1 09/02/2021 ABC 10
key2 09/02/2021 DEF -4
key3 09/02/2021 GHI 9
key1 10/02/2021 ABC 30
key2 10/02/2021 DEFG 6
key4 10/02/2021 LMN 2

I want to output a table which is structured this way

OutTable:

Key Date Field1 Value T Value T-1 Daily Diff
key1 10/02/2021 ABC 30 10 20
key2 10/02/2021 DEFG 6 -4 10
key3 10/02/2021 GHI 0 9 -9
key4 10/02/2021 LMN 2 0 2

Where I collapse on a date the current value, the value of the previous day and the difference between the two which have the same key, keeping also all the cases when a key is missing in either of the dates (for instance, key3 missing on 10/2 and key4 missing on 9/2).

Field1 is usually the same if key is the same. If they are different, the value of the most recent one has to be kept (as it is for key2 in the example).

As far as I know, this can be done:

  • By self-joining the InTable on in1.key = in2.key and in1.day = in2.day-1
  • By generating a sparse matrix with an union all between two dates and then peforming a group by on the key field
  • By using the pivot construct

What is the most efficient and quicker way to do this job in Oracle?

Thanks a lot.


Solution

  • Here is what you could do (explanation after the code):

    alter session set nls_date_format = 'dd/mm/yyyy';
    
    with
      intable (key, date_, field1, value) as (
        select 'key1', to_date('09/02/2021'), 'ABC' , 10 from dual union all
        select 'key2', to_date('09/02/2021'), 'DEF' , -4 from dual union all
        select 'key3', to_date('09/02/2021'), 'GHI' ,  9 from dual union all
        select 'key1', to_date('10/02/2021'), 'ABC' , 30 from dual union all
        select 'key2', to_date('10/02/2021'), 'DEFG',  6 from dual union all
        select 'key4', to_date('10/02/2021'), 'LMN' ,  2 from dual
      )
    , ref_date (dt) as (select to_date('10/02/2021') from dual)
    , prep (key, field1, new_val, old_val) as (
        select i.key, 
               min(field1) keep (dense_rank last order by date_),
               nvl(min(case i.date_ when r.dt     then value end), 0),
               nvl(min(case i.date_ when r.dt - 1 then value end), 0)
        from   intable i cross join ref_date r
        where  date_ in (r.dt - 1, r.dt)
        group  by i.key, r.dt
      )
    select p.key, r.dt as date_, p.field1,
           p.new_val, p.old_val, p.new_val - p.old_val as daily_diff
    from   prep p cross join ref_date r
    order  by key   --  If needed
    ;
    
    
    KEY   DATE_       FIELD1  NEW_VAL  OLD_VAL  DAILY_DIFF
    ----  ----------  ------  -------  -------  ----------
    key1  10/02/2021  ABC          30       10          20
    key2  10/02/2021  DEFG          6       -4          10
    key3  10/02/2021  GHI           0        9          -9
    key4  10/02/2021  LMN           2        0           2
    

    Explanation:

    The INTABLE subquery in the WITH clause is just an easy way to simulate your input data (instead of creating a table and inserting in it, and then remembering to drop the table from my system). In real-life use, drop INTABLE from the query, and instead use your actual table and column names.

    Note the ALTER SESSION statement I ran first - so I don't need to give a format model to TO_DATE. Note also that I changed the column name from DATE since DATE is a reserved keyword in Oracle. (Don't use double-quotes around column names to work around that constraint; just use a different name, like I did.)

    You will see also that I used different column names in the output. Change them if you need to, but don't create column names with spaces in them, with dashes, etc. Use standard column names.

    REF_DATE is also probably not needed; I use it to give the "report date" - but it is better if that is a bind variable. Alternatively, use a bind variable instead of the hard-coded value I gave, right there in that subquery (and keep REF_DATE in the WITH clause).

    The main work is done in the PREP subquery. In fact, the optimizer will combine this subquery with the outer query (at the end of the code); I wrote it the way I did only to save some typing - this way I don't need to repeat the definitions of OLD_VAL and NEW_VAL to compute the difference. The optimized query (the result of the internal Oracle optimization) will only do one SELECT, and it will use "sort group by" (using the resulting sort for the ORDER BY of the outer query, in a single pass).

    The query (the PREP subquery) filters just for the two dates needed for reporting - which will be helped tremendously by an index on column DATE_. Then the query uses standard aggregation (not analytic functions or any other features) - which will very likely make it faster than the alternatives.

    It is not clear why you need the DATE_ column in the output. It is not based on anything in the data - it just reports back the "reference date" you gave as input. I included it in case you do really need it (it's very inexpensive - I cross-join again to the scalar subquery REF_DATE), but if you don't need it, you can remove it from the main (outer) query, and you won't need the cross join, and you won't need an alias for PREP and the P qualifiers for the other columns.