Search code examples
sqlmysqlcountdistinctwindow-functions

SQL Count Distinct over Rolling Window on Timestamp


I have this SQL query:

SELECT COUNT(DISTINCT CustomerName) over(
            ORDER BY OrderTimestamp 
            RANGE BETWEEN INTERVAL 2 hour PRECEDING AND CURRENT ROW 
        ) AS count_per_time
FROM Orders
WHERE CustomerName IS NOT null
AND CustomerName != ''

but it doesn't work because DISTINCT is not allowed with OVER clause. Could anyone help me how to solve it, please? Thank you very much.


Solution

  • I don't think that this can be solved with window functions.

    There is an emulation technique that uses a substraction of dense_ranks, but the latter does not support window frame specifications (ie: the range / rows syntax in your original code). Well, to be precise, it actually allows the syntax, but silently ignores it : so it still operates over the whole partition. This is explained in the documentation:

    Standard SQL specifies that window functions that operate on the entire partition should have no frame clause. MySQL permits a frame clause for such functions but ignores it. These functions use the entire partition even if a frame is specified:

    CUME_DIST()
    DENSE_RANK()
    LAG()
    LEAD()
    NTILE()
    PERCENT_RANK()
    RANK()
    ROW_NUMBER()
    

    It seems like the only option left is a subquery:

    SELECT o.*,
        (
            SELECT COUNT(DISTINCT o1.CustomerName) 
            FROM Orders o1
            WHERE o1.CustomerName != ''
            WHERE o1.OrderTimestamp BETWEEN o.OrderTimestamp - INTERVAL 2 HOUR AND o.OrderTimestamp
        ) AS count_per_time
    FROM Orders o
    WHERE o.CustomerName IS NOT null and o.CustomerName != ''