Search code examples
sqloracleanalytics

SQL query challenge


So here's yet another 'write a query to X' challenge.

I'm monitoring a number of networked vending machines. Each machine has a number of parts, e.g. bank note acceptor, coin system, printer and so on.

Problems with machine parts are logged in table, let's call it 'faults', which looks something like this (irrelevant fields omitted):

machineid           partid         start_time            end_time
---------           ------         ----------------      ----------------
       1                2          2009-10-05 09:00      NULL
       1                3          2009-10-05 08:00      2009-10-05 10:00
       2                2          2009-09-30 12:00      2009-09-30 14:00
       3                4          2009-09-28 13:00      2009-09-28 15:00
       3                2          2009-09-28 12:00      2009-09-28 14:00

end_date is NULL if the problem is currently ongoing.

I need a query which show time periods for which the machine as a whole is down, and which can account for overlapping ranges, collapsing them down into a single record. So for the sample data above, it would produce:

machineid          start_time            end_time
---------          ----------------      ----------------
       1           2009-10-05 08:00      NULL
       2           2009-09-30 12:00      2009-09-30 14:00
       3           2009-09-28 12:00      2009-09-28 15:00

It's not tough to write procedural code to do this line by line, but a nice declarative SQL query would be more useful, more elegant. It seems like it ought to be possible, I just can't quite get there though.

SQL dialect is Oracle. Analytic functions are availabe if that would help.

Thanks!


Solution

  • using analytics, you can build a query that will make a single pass on the data (with a large data set this will be the most efficient):

    SELECT machineid, MIN(start_time), MAX(end_time)
      FROM (SELECT machineid, start_time, end_time, 
                   SUM(gap) over(PARTITION BY machineid 
                                 ORDER BY start_time) contiguous_faults
               FROM (SELECT machineid, start_time, 
                            coalesce(end_time, DATE '9999-12-31') end_time,
                             CASE
                                WHEN start_time > MAX(coalesce(end_time, 
                                                               DATE '9999-12-31'))
                                                  over(PARTITION BY machineid 
                                                       ORDER BY start_time 
                                                       ROWS BETWEEN UNBOUNDED PRECEDING
                                                                AND 1 preceding)
                                THEN 1
                             END gap
                        FROM faults))
     GROUP BY machineid, contiguous_faults
     ORDER BY 1, 2
    

    This query starts by determining if a row is contiguous to any row that started before. We then group the rows that are contiguous.