Search code examples
sqloraclequery-optimization

Number of unique dates


There is table:

CREATE TABLE my_table 
  (gr_id      NUMBER,
   start_date DATE,
   end_date   DATE);

All dates always have zero time portion. I need to know a fastest way to compute number of unique dates inside gr_id.

For example, if there is rows (dd.mm.rrrr):

1 | 01.01.2000 | 07.01.2000
1 | 01.01.2000 | 07.01.2000
2 | 01.01.2000 | 03.01.2000
2 | 05.01.2000 | 07.01.2000
3 | 01.01.2000 | 04.01.2000
3 | 03.01.2000 | 05.01.2000

then right answer will be

1 | 7 
2 | 6 
3 | 5

At now I use additional table

CREATE TABLE mfr_date_list
   (MFR_DATE DATE);

with every date between 01.01.2000 and 31.12.2020 and query like this:

SELECT COUNT(DISTINCT mfr_date_list.mfr_date) cnt, 
       dt.gr_id
  FROM dwh_mfr.mfr_date_list,
        (SELECT gr_id, 
                start_date AS sd, 
                end_date AS ed
               FROM my_table
        ) dt
 WHERE mfr_date_list.mfr_date BETWEEN dt.sd AND dt.ed
   AND dt.ed IS NOT NULL
 GROUP BY dt.gr_id

This query return correct resul data set, but I think it's not fastest way. I think there is some way to build query withot table mfr_date_list at all.

Oracle 11.2 64-bit.


Solution

  • I would expect what you're doing to be the fastest way (as always test). Your query can be simplified, though this only aids understanding and not necessarily speed:

    select t.gr_id, count(distinct dl.mfr_date) as cnt
      from my_table t
      join mfr_date_list dl
        on dl.mfr_date between t.date_start and t.date_end
     where t.end_date is not null
     group by t.gr_id
    

    Whatever you do you have to generate the data between the two dates somehow as you need to remove the overlap. One way would be to use CAST(MULTISET()), as Lalit Kumar explains:

    select gr_id, count(distinct end_date - column_value + 1)
      from my_table m
     cross join table(cast(multiset(select level
                                      from dual
                                   connect by level <= m.end_date - m.start_date + 1
                                           ) as sys.odcinumberlist))
     group by gr_id;
    
         GR_ID COUNT(DISTINCTEND_DATE-COLUMN_VALUE+1)
    ---------- --------------------------------------
             1                                      7
             2                                      6
             3                                      5
    

    This is very Oracle specific but should perform substantially better than most other row-generators as you're only accessing the table once and you're generating the minimal number of rows required due to the condition linking MY_TABLE and your generated rows.