Maybe it's an easy problem, but I can't figure out how to solve it. I'm working on ClickHouse 20.12.5.14. My goal is to get the difference (in minutes) between two datetimes, excluding some days which are configured in a "non-working-days" table. Please notice that I'm not interested in excluding only Sundays and Saturdays: I should be able to exclude from the calculation only specific dates.
These are my tables (definitions at the end of the text):
select *
from test_orders_workflow
order by order_no, phase_no;
order_no|phase_no|phase_descr |phase_date |
--------|--------|----------------------|-------------------|
O_1342 | 10|Order placed |2021-01-04 10:20:00|
O_1342 | 20|Payment processing |2021-01-06 10:00:00|
O_1342 | 30|Order Fulfillment |2021-01-08 11:00:00|
O_1342 | 40|Shipping and Delivery |2021-01-14 13:30:00|
O_6543 | 10|Order placed |2021-02-03 15:00:00|
O_6543 | 20|Payment processing |2021-02-03 17:30:00|
O_6543 | 25|Payment refused |2021-02-03 17:33:00|
O_7836 | 10|Order placed |2021-01-04 10:30:00|
O_7836 | 15|Order Cancelled |2021-01-10 16:00:00|
select * from test_orders_nwd;
not_w_day |
----------|
2021-01-01|
2021-01-07|
2021-02-01|
Of course, getting only the deltas is quite straightforward:
with t as (
select order_no,
arraySort(groupArray(phase_no)) phases_no,
arraySort((x,y) -> y, groupArray(phase_descr), phases_no) phases_descr,
arraySort((x,y) -> y, groupArray(phase_date), phases_no) phases_end,
arrayPushFront(arrayPopBack(phases_end), phases_end [1]) phases_begin,
arrayMap((x,y) -> trunc((x-y)/60), phases_end, phases_begin) phases_duration
from test_orders_workflow
group by order_no
)
select
order_no,
phases_no as phase_no,
phases_descr as phase_descr,
phases_begin as phase_begin,
phases_end as phase_end,
phases_duration as minutes
from
t
array join phases_no,
phases_descr,
phases_begin,
phases_end,
phases_duration
order by
order_no,
phases_no;
order_no|phase_no|phase_descr |phase_begin |phase_end |minutes|
--------|--------|----------------------|-------------------|-------------------|-------|
O_1342 | 10|Order placed |2021-01-04 10:20:00|2021-01-04 10:20:00| 0.0|
O_1342 | 20|Payment processing |2021-01-04 10:20:00|2021-01-06 10:00:00| 2860.0|
O_1342 | 30|Order Fulfillment |2021-01-06 10:00:00|2021-01-08 11:00:00| 2940.0|
O_1342 | 40|Shipping and Delivery |2021-01-08 11:00:00|2021-01-14 13:30:00| 8790.0|
O_6543 | 10|Order placed |2021-02-03 15:00:00|2021-02-03 15:00:00| 0.0|
O_6543 | 20|Payment processing |2021-02-03 15:00:00|2021-02-03 17:30:00| 150.0|
O_6543 | 25|Payment refused |2021-02-03 17:30:00|2021-02-03 17:33:00| 3.0|
O_7836 | 10|Order placed |2021-01-04 10:30:00|2021-01-04 10:30:00| 0.0|
O_7836 | 15|Order Cancelled |2021-01-04 10:30:00|2021-01-10 16:00:00| 8970.0|
but I can't figure out how to exclude from the difference the dates stored in the non-working-days definition. What I'm trying to get, is something like this (look at the third and last rows in the two last columns):
order_no|phase_no|phase_descr |phase_begin |phase_end |minutes|working_minutes|
--------|--------|----------------------|-------------------|-------------------|-------|---------------|
O_1342 | 10|Order placed |2021-01-04 10:20:00|2021-01-04 10:20:00| 0.0| 0.0|
O_1342 | 20|Payment processing |2021-01-04 10:20:00|2021-01-06 10:00:00| 2860.0| 2860.0|
O_1342 | 30|Order Fulfillment |2021-01-06 10:00:00|2021-01-08 11:00:00| 2940.0| 1500.0|
O_1342 | 40|Shipping and Delivery |2021-01-08 11:00:00|2021-01-14 13:30:00| 8790.0| 8790.0|
O_6543 | 10|Order placed |2021-02-03 15:00:00|2021-02-03 15:00:00| 0.0| 0.0|
O_6543 | 20|Payment processing |2021-02-03 15:00:00|2021-02-03 17:30:00| 150.0| 150.0|
O_6543 | 25|Payment refused |2021-02-03 17:30:00|2021-02-03 17:33:00| 3.0| 3.0|
O_7836 | 10|Order placed |2021-01-04 10:30:00|2021-01-04 10:30:00| 0.0| 0.0|
O_7836 | 15|Order Cancelled |2021-01-04 10:30:00|2021-01-10 16:00:00| 8970.0| 7530.0|
My approach was to include in the query the count of non-working-days:
...
count(*)*24*60 from test_orders_nwd where not_w_day between the first and the second date
...
but it was unsuccessful, since Clickhouse doesn't allow you to include a subquery like this (without a join), neither in a standard query, nor using arrays. The following, for example, gives you an exception:
select order_no, phase_no, phase_descr, phase_begin, phase_end, minutes,
minutes - (select count(*)
from test_orders_nwd tt
where tt.not_w_day between phase_begin and phase_end
)
from (
with t as (
select order_no,
arraySort(groupArray(phase_no)) phases_no,
arraySort((x,y) -> y, groupArray(phase_descr), phases_no) phases_descr,
arraySort((x,y) -> y, groupArray(phase_date), phases_no) phases_end,
arrayPushFront(arrayPopBack(phases_end), phases_end [1]) phases_begin,
arrayMap((x,y) -> trunc((x-y)/60), phases_end, phases_begin) phases_duration
from test_orders_workflow
group by order_no
)
select
order_no,
phases_no as phase_no,
phases_descr as phase_descr,
phases_begin as phase_begin,
phases_end as phase_end,
phases_duration as minutes
from
t
array join phases_no,
phases_descr,
phases_begin,
phases_end,
phases_duration
order by
order_no,
phases_no
);
-->
ClickHouse exception, code: 47, host: 10.0.1.137, port: 8123; Code: 47, e.displayText() = DB::Exception: Missing columns: 'phase_end' 'phase_begin' while processing query: 'SELECT count() FROM test_orders_nwd AS tt WHERE (not_w_day >= phase_begin) AND (not_w_day <= phase_end)', required columns: 'not_w_day' 'phase_begin' 'phase_end', source columns: 'not_w_day': While processing (SELECT count(*) FROM test_orders_nwd AS tt WHERE (tt.not_w_day >= phase_begin) AND (tt.not_w_day <= phase_end)) AS _subquery13741024: While processing minutes - ((SELECT count(*) FROM test_orders_nwd AS tt WHERE (tt.not_w_day >= phase_begin) AND (tt.not_w_day <= phase_end)) AS _subquery13741024) (version 20.12.5.14 (official build))
and the same is when referencing the external table from the arrayMap function.
By the way, using a static array works fine, but I don't think it possible to compare the dates in the array with "x" and "y" in the lambda function:
select order_no, phase_no, phase_descr, phase_begin, phase_end, minutes, minutes_wd
from (
with t as (
with ['2021-01-04','2021-01-05'] as excluded_days
select order_no,
arraySort(groupArray(phase_no)) phases_no,
arraySort((x,y) -> y, groupArray(phase_descr), phases_no) phases_descr,
arraySort((x,y) -> y, groupArray(phase_date), phases_no) phases_end,
arrayPushFront(arrayPopBack(phases_end), phases_end [1]) phases_begin,
arrayMap((x,y) -> trunc((x-y)/60), phases_end, phases_begin) phases_duration,
arrayMap((x,y) -> trunc((x-y - length(excluded_days)*24*60)/60) , phases_end, phases_begin) phases_duration_wd
from test_orders_workflow
group by order_no
)
select
order_no,
phases_no as phase_no,
phases_descr as phase_descr,
phases_begin as phase_begin,
phases_end as phase_end,
phases_duration as minutes,
phases_duration_wd as minutes_wd
from
t
array join phases_no,
phases_descr,
phases_begin,
phases_end,
phases_duration,
phases_duration_wd
order by
order_no,
phases_no
);
What should I do?
Thanks in advance if you can help me. Here you can find the definitions, in case you want to try:
create table test_orders_workflow
(order_no String, phase_no Int8, phase_descr String, phase_date Datetime) engine = Log;
insert into test_orders_workflow values
('O_1342',10,'Order placed ',toDateTime('2021-01-04 10:20:00')),
('O_1342',20,'Payment processing',toDateTime('2021-01-06 10:00:00')),
('O_1342',30,'Order Fulfillment ',toDateTime('2021-01-08 11:00:00')),
('O_1342',40,'Shipping and Delivery ',toDateTime('2021-01-14 13:30:00')),
('O_7836',10,'Order placed ',toDateTime('2021-01-04 10:30:00')),
('O_7836',15,'Order Cancelled ',toDateTime('2021-01-10 16:00:00')),
('O_6543',10,'Order placed ',toDateTime('2021-02-03 15:00:00')),
('O_6543',20,'Payment processing',toDateTime('2021-02-03 17:30:00')),
('O_6543',25,'Payment refused',toDateTime('2021-02-03 17:33:00'));
create table test_orders_nwd (not_w_day date) engine = Log;
insert into test_orders_nwd values
('2021-01-01'),
('2021-01-07'),
('2021-02-01');
with t as (
select
(select groupArray(not_w_day) from test_orders_nwd) as gnot_w_day,
order_no,
arraySort(groupArray(phase_no)) phases_no,
arraySort((x,y) -> y, groupArray(phase_descr), phases_no) phases_descr,
arraySort((x,y) -> y, groupArray(phase_date), phases_no) phases_end,
arrayPushFront(arrayPopBack(phases_end), phases_end [1]) phases_begin,
arrayMap((x,y) -> (trunc((x-y)/60), trunc((x-y)/60) - 24*60*length(arrayFilter(z -> z between y and x, gnot_w_day))), phases_end, phases_begin) phases_duration
from test_orders_workflow
group by order_no
)
select
order_no,
phase_no,
phase_descr,
phase_begin,
phase_end,
t_minutes.1 minutes,
t_minutes.2 working_minutes
from
t
array join phases_no as phase_no,
phases_descr as phase_descr,
phases_begin as phase_begin,
phases_end as phase_end,
phases_duration as t_minutes
order by
order_no,
phases_no;
┌─order_no─┬─phase_no─┬─phase_descr────────────┬─────────phase_begin─┬───────────phase_end─┬─minutes─┬─working_minutes─┐
│ O_1342 │ 10 │ Order placed │ 2021-01-04 10:20:00 │ 2021-01-04 10:20:00 │ 0 │ 0 │
│ O_1342 │ 20 │ Payment processing │ 2021-01-04 10:20:00 │ 2021-01-06 10:00:00 │ 2860 │ 2860 │
│ O_1342 │ 30 │ Order Fulfillment │ 2021-01-06 10:00:00 │ 2021-01-08 11:00:00 │ 2940 │ 1500 │
│ O_1342 │ 40 │ Shipping and Delivery │ 2021-01-08 11:00:00 │ 2021-01-14 13:30:00 │ 8790 │ 8790 │
│ O_6543 │ 10 │ Order placed │ 2021-02-03 15:00:00 │ 2021-02-03 15:00:00 │ 0 │ 0 │
│ O_6543 │ 20 │ Payment processing │ 2021-02-03 15:00:00 │ 2021-02-03 17:30:00 │ 150 │ 150 │
│ O_6543 │ 25 │ Payment refused │ 2021-02-03 17:30:00 │ 2021-02-03 17:33:00 │ 3 │ 3 │
│ O_7836 │ 10 │ Order placed │ 2021-01-04 10:30:00 │ 2021-01-04 10:30:00 │ 0 │ 0 │
│ O_7836 │ 15 │ Order Cancelled │ 2021-01-04 10:30:00 │ 2021-01-10 16:00:00 │ 8970 │ 7530 │
└──────────┴──────────┴────────────────────────┴─────────────────────┴─────────────────────┴─────────┴─────────────────┘