I am trying to get the ids of the customers who have placed orders on consecutive days. The table is created as follows:
create table orders(
orderid INT,
orderdate date,
customerid int
);
The values:
insert into orders (orderid, orderdate, customerid)
values(1,'2023-06-20',1),
(2, '2023-06-21', 2),
(3, '2023-06-22', 3),
(4, '2023-06-22', 1),
(5, '2023-06-23', 3),
(6, '2023-06-22', 1),
(7, '2023-06-26', 4),
(8, '2023-06-27', 4),
(9, '2023-06-29', 4),
(10, '2023-06-29', 5),
(11, '2023-06-30', 5),
(12, '2023-06-28', 5),
(13, '2023-06-25', 4),
(14, '2023-06-24', 4),
(15, '2023-06-30', 4);
The code that I wrote gave the output if ids with orders consecutive days but left the ids of customers who had a gap in their order despite having more number of orders before the gap actually occurred. The code that I wrote:
with t1 as(
select customerid, orderdate,
case when lead(orderdate) over (partition by customerid order by orderdate) is null then 1
else abs(orderdate - lead(orderdate) over (partition by customerid order by orderdate)) end as gap
from orders)
select customerid, sum(gap) as consecutive
from t1
where gap>0
group by customerid
having count(*)=sum(gap) and count(*)>1;
The output:
+------------+------------------+
| customerid | consecutive_days |
+------------+------------------+
| 3 | 2 |
| 5 | 3 |
+------------+------------------+
The output I would want:
+------------+------------------+
| customerid | consecutive_days |
+------------+------------------+
| 3 | 2 |
| 4 | 4 |
| 4 | 2 |
| 5 | 3 |
+------------+------------------+
Since customer with customerid 4 has ordered from 2023-06-24 to 2023-06-27. The next order from the same customer is on 2023-06-29 and 2023-06-30 hence not continuous and should occur as a separate row.
Edit: The orders placed must be on consecutive days, irrespective of the number of orders placed on a single day.
This is an other solution to resolve this gaps and islands, it can be resolved by calculating the difference between row numbers (DENSE_RANK
is used because various customers may place orders on the same day) , which assigns a unique identifier to each sequence of consecutive orders for a customer :
WITH cte AS (
SELECT
*, DENSE_RANK() over(ORDER BY orderdate) - DENSE_RANK() over (PARTITION BY customerid ORDER BY orderdate) as grp
FROM orders
)
SELECT customerid, COUNT(DISTINCT orderdate) AS consecutive_days
FROM cte
WHERE grp > 0
GROUP BY customerid, grp
HAVING consecutive_days > 1;
Results :
customerid consecutive_days
3 2
4 4
4 2
5 3