Imagine I have this table:
item | state | date |
---|---|---|
A | blue | 2022-12-27 |
A | red | 2022-12-31 |
B | green | 2022-12-28 |
B | yellow | 2022-12-31 |
C | blue | 2022-12-29 |
D | red | 2022-12-26 |
The date column represents the date range of when the item existed. For example, "A" existed from 2022-12-27 to 2022-12-30 in the state "blue" and on the 2022-12-31 in the state "red". Now, I want to have the count of the items per date if they are in the range. The result table should look like this:
date | count_item | comment |
---|---|---|
2022-12-26 | 1 | (D) |
2022-12-27 | 1 | (A) |
2022-12-28 | 2 | (A & B) |
2022-12-29 | 3 | (A & B & C) |
2022-12-30 | 2 | (A & B) |
2022-12-31 | 2 | (A & B) |
How can I achieve this in ClickHouse?
I tried to use this code, but it returns the wrong results.
SELECT
toDate(date) AS d,
COUNT(item) AS i
FROM my_table
GROUP BY d
ORDER BY d ASC WITH FILL STEP toIntervalDay(1)
INTERPOLATE ( i AS i );
WITH
MYTAB AS
(
SELECT c1 as item, c2 as state, c3 as date
FROM VALUES
(
('A', 'blue', toDate ('2022-12-27'))
, ('A', 'red', toDate ('2022-12-31'))
, ('B', 'green', toDate ('2022-12-28'))
, ('B', 'yellow', toDate ('2022-12-31'))
, ('C', 'blue', toDate ('2022-12-29'))
, ('D', 'red', toDate ('2022-12-26'))
)
)
, D AS
(
SELECT toDate (arrayJoin (range (toUInt32 (min (date)), toUInt32 (max (date)) + 1))) AS dt
FROM MYTAB
)
, G AS
(
SELECT item, MIN (date) AS date_min, MAX (date) AS date_max
FROM MYTAB
GROUP BY item
)
SELECT
D.dt AS date
, count (1) AS count_item
, groupArray (G.item) AS comment
FROM D, G
WHERE D.dt BETWEEN G.date_min AND G.date_max
GROUP BY D.dt
ORDER BY D.dt
FORMAT PrettyCompact
;
The result is:
+-------date-+-count_item-+-comment-------+
| 2022-12-26 | 1 | ['D'] |
| 2022-12-27 | 1 | ['A'] |
| 2022-12-28 | 2 | ['B','A'] |
| 2022-12-29 | 3 | ['B','C','A'] |
| 2022-12-30 | 2 | ['B','A'] |
| 2022-12-31 | 2 | ['B','A'] |
+------------+------------+---------------+