Search code examples
mysqlwindow-functionsgaps-and-islands

MySQL query to find the customers who have placed orders on consecutive days


I am trying to get the ids of the customers who have placed orders on consecutive days. The table is created as follows:

create table orders(
  orderid INT,
  orderdate date,
  customerid int
);

The values:

insert into orders (orderid, orderdate, customerid)
values(1,'2023-06-20',1),
    (2, '2023-06-21', 2),
    (3, '2023-06-22', 3),
    (4, '2023-06-22', 1),
    (5, '2023-06-23', 3),
    (6, '2023-06-22', 1),
    (7, '2023-06-26', 4),
    (8, '2023-06-27', 4),
    (9, '2023-06-29', 4),
    (10, '2023-06-29', 5),
    (11, '2023-06-30', 5),
    (12, '2023-06-28', 5),
    (13, '2023-06-25', 4),
    (14, '2023-06-24', 4),
    (15, '2023-06-30', 4);

The code that I wrote gave the output if ids with orders consecutive days but left the ids of customers who had a gap in their order despite having more number of orders before the gap actually occurred. The code that I wrote:

with t1 as(
select customerid, orderdate,
case when lead(orderdate) over (partition by customerid order by orderdate) is null then 1
else abs(orderdate - lead(orderdate) over (partition by customerid order by orderdate)) end as gap
from orders)
select customerid, sum(gap) as consecutive
from t1
where gap>0
group by customerid
having count(*)=sum(gap) and count(*)>1;

The output:

+------------+------------------+
| customerid | consecutive_days |
+------------+------------------+
|          3 |                2 |
|          5 |                3 |
+------------+------------------+

The output I would want:

+------------+------------------+
| customerid | consecutive_days |
+------------+------------------+
|          3 |                2 |
|          4 |                4 |
|          4 |                2 |
|          5 |                3 |
+------------+------------------+

Since customer with customerid 4 has ordered from 2023-06-24 to 2023-06-27. The next order from the same customer is on 2023-06-29 and 2023-06-30 hence not continuous and should occur as a separate row.

Edit: The orders placed must be on consecutive days, irrespective of the number of orders placed on a single day.


Solution

  • This is an other solution to resolve this gaps and islands, it can be resolved by calculating the difference between row numbers (DENSE_RANK is used because various customers may place orders on the same day) , which assigns a unique identifier to each sequence of consecutive orders for a customer :

    WITH cte AS (
      SELECT
        *, DENSE_RANK() over(ORDER BY orderdate) - DENSE_RANK() over (PARTITION BY customerid ORDER BY orderdate) as grp
      FROM orders
    )
    SELECT customerid, COUNT(DISTINCT orderdate) AS consecutive_days 
    FROM cte
    WHERE grp > 0
    GROUP BY customerid, grp
    HAVING consecutive_days > 1;
    

    Results :

    customerid  consecutive_days
    3           2
    4           4
    4           2
    5           3
    

    Demo here