Search code examples
sqloracle-databaseoracle10gsql-optimization

join two Oracle tables, date to not overlapping date ranges


I have two tables:

  • trips: id_trip, id_object, trip_date, delta (8980026 rows)
  • ranges: id_range, id_object, date_since, date_until (18490 rows)

I need to optimize the following select statement

  select r.id_range, sum(t.delta) sum_deltas
    from trips t,
         ranges r
   where t.id_object = r.id_object
     and t.trip_date between r.date_since and r.date_until
group by r.id_range

according to the condition there is always exactly one matching row for trip in 'ranges'

  • the trips table is constantly growing, but there are no updates or deletions
  • table ranges may change from time to time in any way (deletions, updates, inserts), so function based index is not the way :(
  • there are indexes on id_object (in both tables) and date_since (in trips)

Does anyone have an idea how to speed things up, is it even possible?


Solution

  • It's always possible to speed things up; it just may not be worth the time / effort / money / disk-space / additional overheads etc.

    Firstly please use the explicit join syntax. It's been the SQL standard for a few decades now and it helps avoid a lot of potential errors. Your query would become:

    select r.id_range, sum(t.delta) sum_deltas
      from trips t
      join ranges r
        on t.id_object = r.id_object
       and t.trip_date between r.date_since and r.date_until
     group by r.id_range
    

    This query would imply that you need two indexes - unique if possible. On ranges you should have an index on id_object, date_since, date_until. The index on trips would be id_object, trip_date. If trips were smaller I might consider adding delta on to the end of that index so you never enter the table at all but only do a index scan. As it stands you're going to have to do a table access by index rowid.

    Having written all that your problem may be slightly different. You're going to be full-scanning both tables with this query. Your problem might be the indexes. If the optimizer is using the indexes then it's possible you're doing an index unique/range scan for each id_object in trips or ranges and then, because of the use of columns not in the indexes you will be doing an table access by index rowid. This can be massively expensive.

    Try adding a hint to force a full-scan of both tables:

    select /*+ full(t) full(r) */ r.id_range, sum(t.delta) sum_deltas
      from trips t
      join ranges r
        on t.id_object = r.id_object
       and t.trip_date between r.date_since and r.date_until
     group by r.id_range