I'm sorry if this question has been asked here before, but I can't seem to find it. I keep finding how to sum per hour, but my question is regarding SUM and COUNT between timestamps that are defined in another column.
I have one table called incoming_orders: It shows the intended destination, and the timestamp of the incoming order.
I have a second table called scheduled_output: It shows each scheduled output moment for each destination.
I have third table called outgoing_orders: It shows how the actual destination, and the timestamp of the outgoing order.
So, the data could be:
--Incoming_orders:
Destination Timestamp
ROUTE B 14/03/2018 7:48:00
ROUTE A 14/03/2018 7:58:00
ROUTE A 14/03/2018 12:48:00
ROUTE C 14/03/2018 13:28:00
--Scheduled_Output
ROUTE A 14/03/2018 8:00:00
ROUTE A 14/03/2018 11:00:00
ROUTE A 14/03/2018 12:00:00
ROUTE A 14/03/2018 17:00:00
ROUTE B 14/03/2018 8:00:00
ROUTE B 14/03/2018 10:00:00
ROUTE B 14/03/2018 12:00:00
ROUTE C 14/03/2018 07:00:00
ROUTE C 14/03/2018 14:00:00
ROUTE C 14/03/2018 17:00:00
--Which would lead to the following outgoing_orders:
ROUTE A 14/03/2018 8:00:00
ROUTE B 14/03/2018 8:00:00
ROUTE C 14/03/2018 14:00:00
ROUTE A 14/03/2018 17:00:00
Now, I want to check that the incoming order of 07:58 to route A actually made it into the output cycle of 08:00 for route A. I was thinking of creating a table like this to show it:
Destination output moment expected_output actual_output diff
Route A 8:00 1 1 0
Route A 11:00 0 0 0
Route A 12:00 0 0 0
Route A 17:00 1 1 0
But the question is: How do I calculate that expected_output column? How do I group the incoming order to Route A of 12:48 to the 12:00-17:00 group? It should count all the orders between scheduled output moments, but I'm not sure how to accomplish that.
Could I perhaps CEIL, FLOOR or ROUND to the closest scheduled_output value? Or can I somehow do a BETWEEN row n and n+1 with a rowcount? Or is there another, easier way?
I think it's easiest to determine the previous time of scheduled output, obtaining time intervals, more or less in this way:
SELECT destination,
time_stamp,
( SELECT max( time_stamp )
FROM SCHEDULED_OUTPUT t1
WHERE t.destination = t1.destination
AND t1.time_stamp < t.time_stamp
) as previous_time_stamp
FROM SCHEDULED_OUTPUT t
order by 1,2
or in a more compact form using analytic funtion:
SELECT destination,
time_stamp,
lag( time_stamp ) over (partition by destination order by time_stamp )
as previous_time_stamp
FROM SCHEDULED_OUTPUT t
order by 1,2
demo: http://sqlfiddle.com/#!4/c7bc9/1
| DESTINATION | TIME_STAMP | PREVIOUS_TIME_STAMP |
|-------------|-----------------------|-----------------------|
| ROUTE A | 2018-03-14 08:00:00.0 | (null) |
| ROUTE A | 2018-03-14 11:00:00.0 | 2018-03-14 08:00:00.0 |
| ROUTE A | 2018-03-14 12:00:00.0 | 2018-03-14 11:00:00.0 |
| ROUTE A | 2018-03-14 17:00:00.0 | 2018-03-14 12:00:00.0 |
| ROUTE B | 2018-03-14 08:00:00.0 | (null) |
| ROUTE B | 2018-03-14 10:00:00.0 | 2018-03-14 08:00:00.0 |
| ROUTE B | 2018-03-14 12:00:00.0 | 2018-03-14 10:00:00.0 |
| ROUTE C | 2018-03-14 07:00:00.0 | (null) |
| ROUTE C | 2018-03-14 14:00:00.0 | 2018-03-14 07:00:00.0 |
| ROUTE C | 2018-03-14 17:00:00.0 | 2018-03-14 14:00:00.0 |
and next the above resultset can be joined to INCOMING_ORDERS in order to calculate counts:
SELECT x.destination, x.time_stamp as output_moment,
count( y.DESTINATION ) as expected_output
FROM (
SELECT destination,
time_stamp,
lag( time_stamp ) over (partition by destination order by time_stamp )
as previous_time_stamp
FROM SCHEDULED_OUTPUT t
) x
LEFT JOIN INCOMING_ORDERS y
ON x.DESTINATION = y.DESTINATION
AND y.TIME_STAMP <= x.TIME_STAMP
AND ( y.TIME_STAMP > x.previous_time_stamp OR x.previous_time_stamp IS NULL )
GROUP BY x.destination, x.time_stamp
ORDER BY 1,2
Demo: http://sqlfiddle.com/#!4/c3958/2
| DESTINATION | OUTPUT_MOMENT | EXPECTED_OUTPUT |
|-------------|-----------------------|-----------------|
| ROUTE A | 2018-03-14 08:00:00.0 | 1 |
| ROUTE A | 2018-03-14 11:00:00.0 | 0 |
| ROUTE A | 2018-03-14 12:00:00.0 | 0 |
| ROUTE A | 2018-03-14 17:00:00.0 | 1 |
| ROUTE B | 2018-03-14 08:00:00.0 | 1 |
| ROUTE B | 2018-03-14 10:00:00.0 | 0 |
| ROUTE B | 2018-03-14 12:00:00.0 | 0 |
| ROUTE C | 2018-03-14 07:00:00.0 | 0 |
| ROUTE C | 2018-03-14 14:00:00.0 | 1 |
| ROUTE C | 2018-03-14 17:00:00.0 | 0 |
This condition:
AND y.TIME_STAMP <= x.TIME_STAMP
AND ( y.TIME_STAMP > x.previous_time_stamp OR x.previous_time_stamp IS NULL )
tells that if an order is placed at, say 8:00:00 and the route starts at the same time 8:00:00, then this order is still assigned to this "starting" route. If this is not possible (that is - the order must be assigned to the next route when it is placed at the exact time when the route starts), then change the condition to:
AND y.TIME_STAMP < x.TIME_STAMP
AND ( y.TIME_STAMP >= x.previous_time_stamp OR x.previous_time_stamp IS NULL )