I have a big table, partitioned by date, which is structured as follows:
InTable:
Key | Date | Field1 | Value |
---|---|---|---|
key1 | 09/02/2021 | ABC | 10 |
key2 | 09/02/2021 | DEF | -4 |
key3 | 09/02/2021 | GHI | 9 |
key1 | 10/02/2021 | ABC | 30 |
key2 | 10/02/2021 | DEFG | 6 |
key4 | 10/02/2021 | LMN | 2 |
I want to output a table which is structured this way
OutTable:
Key | Date | Field1 | Value T | Value T-1 | Daily Diff |
---|---|---|---|---|---|
key1 | 10/02/2021 | ABC | 30 | 10 | 20 |
key2 | 10/02/2021 | DEFG | 6 | -4 | 10 |
key3 | 10/02/2021 | GHI | 0 | 9 | -9 |
key4 | 10/02/2021 | LMN | 2 | 0 | 2 |
Where I collapse on a date the current value, the value of the previous day and the difference between the two which have the same key, keeping also all the cases when a key is missing in either of the dates (for instance, key3 missing on 10/2 and key4 missing on 9/2).
Field1 is usually the same if key is the same. If they are different, the value of the most recent one has to be kept (as it is for key2 in the example).
As far as I know, this can be done:
What is the most efficient and quicker way to do this job in Oracle?
Thanks a lot.
Here is what you could do (explanation after the code):
alter session set nls_date_format = 'dd/mm/yyyy';
with
intable (key, date_, field1, value) as (
select 'key1', to_date('09/02/2021'), 'ABC' , 10 from dual union all
select 'key2', to_date('09/02/2021'), 'DEF' , -4 from dual union all
select 'key3', to_date('09/02/2021'), 'GHI' , 9 from dual union all
select 'key1', to_date('10/02/2021'), 'ABC' , 30 from dual union all
select 'key2', to_date('10/02/2021'), 'DEFG', 6 from dual union all
select 'key4', to_date('10/02/2021'), 'LMN' , 2 from dual
)
, ref_date (dt) as (select to_date('10/02/2021') from dual)
, prep (key, field1, new_val, old_val) as (
select i.key,
min(field1) keep (dense_rank last order by date_),
nvl(min(case i.date_ when r.dt then value end), 0),
nvl(min(case i.date_ when r.dt - 1 then value end), 0)
from intable i cross join ref_date r
where date_ in (r.dt - 1, r.dt)
group by i.key, r.dt
)
select p.key, r.dt as date_, p.field1,
p.new_val, p.old_val, p.new_val - p.old_val as daily_diff
from prep p cross join ref_date r
order by key -- If needed
;
KEY DATE_ FIELD1 NEW_VAL OLD_VAL DAILY_DIFF
---- ---------- ------ ------- ------- ----------
key1 10/02/2021 ABC 30 10 20
key2 10/02/2021 DEFG 6 -4 10
key3 10/02/2021 GHI 0 9 -9
key4 10/02/2021 LMN 2 0 2
Explanation:
The INTABLE
subquery in the WITH
clause is just an easy way to simulate your input data (instead of creating a table and inserting in it, and then remembering to drop the table from my system). In real-life use, drop INTABLE
from the query, and instead use your actual table and column names.
Note the ALTER SESSION
statement I ran first - so I don't need to give a format model to TO_DATE
. Note also that I changed the column name from DATE
since DATE
is a reserved keyword in Oracle. (Don't use double-quotes around column names to work around that constraint; just use a different name, like I did.)
You will see also that I used different column names in the output. Change them if you need to, but don't create column names with spaces in them, with dashes, etc. Use standard column names.
REF_DATE
is also probably not needed; I use it to give the "report date" - but it is better if that is a bind variable. Alternatively, use a bind variable instead of the hard-coded value I gave, right there in that subquery (and keep REF_DATE
in the WITH
clause).
The main work is done in the PREP
subquery. In fact, the optimizer will combine this subquery with the outer query (at the end of the code); I wrote it the way I did only to save some typing - this way I don't need to repeat the definitions of OLD_VAL
and NEW_VAL
to compute the difference. The optimized query (the result of the internal Oracle optimization) will only do one SELECT
, and it will use "sort group by" (using the resulting sort for the ORDER BY
of the outer query, in a single pass).
The query (the PREP
subquery) filters just for the two dates needed for reporting - which will be helped tremendously by an index on column DATE_
. Then the query uses standard aggregation (not analytic functions or any other features) - which will very likely make it faster than the alternatives.
It is not clear why you need the DATE_
column in the output. It is not based on anything in the data - it just reports back the "reference date" you gave as input. I included it in case you do really need it (it's very inexpensive - I cross-join again to the scalar subquery REF_DATE
), but if you don't need it, you can remove it from the main (outer) query, and you won't need the cross join, and you won't need an alias for PREP
and the P
qualifiers for the other columns.