Hi there, I'm a beginner SQL user. I have an access to a database for research, there's one table of interest structured as follow:
id |
date |
use |
item |
---|---|---|---|
1 | 2023-02-01 | 90 | A |
1 | 2023-03-01 | 10 | B |
1 | 2023-03-15 | 15 | C |
2 | 2023-02-05 | 10 | B |
2 | 2023-02-13 | 30 | A |
This is a table of item
acquisition date
done by a personnel id
. I want to transform this table based on the overlap of date
for each personnel id
, but not quite sure of how to achieve that.
I imagine the resulting table would be as follow:
id |
start |
end |
item |
---|---|---|---|
1 | 1 | 29 | A |
1 | 30 | 39 | A, B |
1 | 40 | 43 | A |
1 | 44 | 58 | A, C |
1 | 59 | 90 | A |
2 | 1 | 8 | B |
2 | 9 | 10 | A, B |
2 | 11 | 38 | A |
My idea here is to:
I want to transform the table using date
as a starting index for each personnel id
This way, we can transform the transformed date
to use
so that we obtain the relative ending date of uses as start
+ use
- 1
id |
start |
use |
end |
item |
---|---|---|---|---|
1 | 1 | 90 | 90 | A |
1 | 30 | 10 | 39 | B |
1 | 44 | 15 | 58 | C |
2 | 1 | 10 | 10 | B |
2 | 9 | 30 | 38 | A |
Visually, the item acquisition journey for each personnel would be:
Personnel 1:
+--------------C (15)
+---------B (10)
+------------------------------------------------------------------------------------------A (90)
+------------------------------+--------------+-------------------------------------------->
1 30 44 90
Personnel 2:
+--------------------------------A (30)
+---------B (10)
+-------+-------------------------------->
1 9 40
Finally, I can evaluate concurrent item acquisition as follow:
id |
start |
end |
item |
---|---|---|---|
1 | 1 | 29 | A |
1 | 30 | 39 | A, B |
1 | 40 | 43 | A |
1 | 44 | 58 | A, C |
1 | 59 | 90 | A |
2 | 1 | 8 | B |
2 | 9 | 10 | A, B |
2 | 11 | 38 | A |
I found some relevant topics to this question, including:
-- Create a dummy table
CREATE TABLE tbl (
id int,
start_date date,
day_of_use int,
item varchar(8)
)
;
-- Populating table with sample data
INSERT INTO tbl VALUES (1, "2023-02-01", 90, "A");
INSERT INTO tbl VALUES (1, "2023-03-02", 10, "B");
INSERT INTO tbl VALUES (1, "2023-03-15", 15, "C");
INSERT INTO tbl VALUES (2, "2023-02-05", 10, "B");
INSERT INTO tbl VALUES (2, "2023-02-13", 30, "A");
-- Querying the overlap
SELECT
a.id,
DATEDIFF(a.start_date, b.init) + 1 AS start,
day_of_use,
DATEDIFF(a.start_date, b.init) + day_of_use AS end,
item
FROM
tbl AS a
LEFT JOIN
(
SELECT id, MIN(start_date) AS init
FROM tbl
GROUP BY id
ORDER BY id
) AS b ON a.id = b.id
;
Though, I'm not sure how to proceed from here.. Any idea would be appreciated, thanks in advance. SQL Fiddle: https://www.db-fiddle.com/f/eB1GnZTuuH5P71kr4pWqHu/2
If you're using MySQL 8.0, you could try:
In order to address the gaps and islands problem, you may want to recreate your partitions using a running sum over the amount of items change in consecutive records.
WITH RECURSIVE cte AS (
SELECT id, start_date AS date_, day_of_use, item
FROM tbl
UNION ALL
SELECT id, DATE_ADD(date_, INTERVAL 1 DAY), day_of_use-1, item
FROM cte
WHERE day_of_use > 0
), cte2 AS (
SELECT id,
DENSE_RANK() OVER(PARTITION BY id ORDER BY date_) AS rn,
item
FROM cte
), cte3 AS (
SELECT id,
rn,
GROUP_CONCAT(item ORDER BY item) AS items,
LAG(GROUP_CONCAT(item ORDER BY item)) OVER(PARTITION BY id) AS prev_items
FROM cte2
GROUP BY id, rn
), cte4 AS (
SELECT id, rn, items,
COUNT(CASE WHEN prev_items != items THEN 1 END) OVER(PARTITION BY id ORDER BY rn) AS parts
FROM cte3
)
SELECT id, MIN(rn) AS start_, MAX(rn) AS end_, items
FROM cte4
GROUP BY id, items, parts
"Output":
id | items | start_ | end_ |
---|---|---|---|
1 | A | 1 | 29 |
1 | A,B | 30 | 40 |
1 | A | 41 | 42 |
1 | A,C | 43 | 58 |
1 | A | 59 | 91 |
2 | B | 1 | 8 |
2 | A,B | 9 | 11 |
2 | A | 12 | 39 |
Check the demo here.
Note that February 2023 doesn't have 29 days.