say you have a table of customers with dates as follows:
[customer_table]
+----------+-----------+----------+
| customer | date | purchase |
+----------+-----------+----------+
| 1 | 1/01/2016 | 12 |
+----------+-----------+----------+
| 1 | 1/12/2016 | 3 |
+----------+-----------+----------+
| 2 | 5/03/2016 | 5 |
+----------+-----------+----------+
| 3 | 1/16/2016 | 6 |
+----------+-----------+----------+
| 3 | 3/22/2016 | 1 |
+----------+-----------+----------+
I want to write a query to count how many distinct customers made a purchase in the last 10 days as a rolling period, starting from each calendar day and counting 10 days backwards. So for each unique day in 2016 the final output would be a calendar, where each day has a count of distinct customers that exist in the prior 10 days of that day of the calendar like the following:
[result_table]
+-----------+------------------+
| date | unique customers |
+-----------+------------------+
| 1/01/2016 | 112 |
+-----------+------------------+
| 1/02/2016 | 104 |
+-----------+------------------+
| 1/03/2016 | 140 |
+-----------+------------------+
| 1/04/2016 | 133 |
+-----------+------------------+
| .... | 121 |
+-----------+------------------+
One solution I came up with was to create a calendar table that is one single column, then join the calendar table to the customer table with an inequality join. I believe this is extremely inefficient and am seeking a faster solution. So my first step was create a calendar like so:
[calendar]
+-----------+
| date |
+-----------+
| 1/01/2016 |
+-----------+
| 1/02/2016 |
+-----------+
| 1/03/2016 |
+-----------+
| 1/04/2016 |
+-----------+
| 1/05/2016 |
+-----------+
Then for each day in that calendar, to count the distinct set of customers prior to each day, I join with an inequality like so:
select
count(distinct customer) as unique customers
from calendar c
left join mytable m
on c.date>=m.date and m.date>=c.date-10
While I believe this is correct, it runs extremely slow (say for 2 years of a calendar with a few million customers). Is there an oracle analytic function that may help me out here?
Is there an oracle analytic function that may help me out here?
Not really - from the COUNT()
documentation:
If you specify
DISTINCT
, then you can specify only thequery_partition_clause
of the analytic_clause. Theorder_by_clause
andwindowing_clause
are not allowed.
You would want both DISTINCT
and a windowing_clause
which isn't allowed.
Update:
You can get the same effect as the invalid syntax using a combination of a non-DISTINCT
analytic query partitioned by customer and then aggregation by day:
Oracle Setup:
CREATE TABLE table_name ( customer, dt ) AS
SELECT 1, DATE '2017-01-10' FROM DUAL UNION ALL
SELECT 1, DATE '2017-01-11' FROM DUAL UNION ALL
SELECT 1, DATE '2017-01-15' FROM DUAL UNION ALL
SELECT 1, DATE '2017-01-20' FROM DUAL UNION ALL
SELECT 2, DATE '2017-01-12' FROM DUAL UNION ALL
SELECT 2, DATE '2017-01-19' FROM DUAL UNION ALL
SELECT 3, DATE '2017-01-10' FROM DUAL UNION ALL
SELECT 3, DATE '2017-01-13' FROM DUAL UNION ALL
SELECT 3, DATE '2017-01-15' FROM DUAL UNION ALL
SELECT 3, DATE '2017-01-20' FROM DUAL;
Query:
Note: the query below is only for one month's data and for a range of two days preceding to illustrate the principle but it is easy enough to change the parameters to 12 months and 10 days.
SELECT day,
SUM( has_order_in_range ) AS unique_customers
FROM (
SELECT customer,
day,
LEAST(
1,
COUNT(dt) OVER ( PARTITION BY customer
ORDER BY day
RANGE BETWEEN INTERVAL '2' DAY PRECEDING
AND INTERVAL '0' DAY FOLLOWING )
) AS has_order_in_range
FROM table_name t
PARTITION BY ( customer )
RIGHT OUTER JOIN
( -- Create a calendar for one month
SELECT DATE '2017-01-01' + LEVEL - 1 AS day
FROM DUAL
CONNECT BY DATE '2017-01-01' + LEVEL - 1 < ADD_MONTHS( DATE '2017-01-01', 1 )
) d
ON ( t.dt = d.day )
)
GROUP BY day
ORDER BY day;
Output:
DAY UNIQUE_CUSTOMERS
------------------- ----------------
2017-01-01 00:00:00 0
2017-01-02 00:00:00 0
2017-01-03 00:00:00 0
2017-01-04 00:00:00 0
2017-01-05 00:00:00 0
2017-01-06 00:00:00 0
2017-01-07 00:00:00 0
2017-01-08 00:00:00 0
2017-01-09 00:00:00 0
2017-01-10 00:00:00 2
2017-01-11 00:00:00 2
2017-01-12 00:00:00 3
2017-01-13 00:00:00 3
2017-01-14 00:00:00 2
2017-01-15 00:00:00 2
2017-01-16 00:00:00 2
2017-01-17 00:00:00 2
2017-01-18 00:00:00 0
2017-01-19 00:00:00 1
2017-01-20 00:00:00 3
2017-01-21 00:00:00 3
2017-01-22 00:00:00 2
2017-01-23 00:00:00 0
2017-01-24 00:00:00 0
2017-01-25 00:00:00 0
2017-01-26 00:00:00 0
2017-01-27 00:00:00 0
2017-01-28 00:00:00 0
2017-01-29 00:00:00 0
2017-01-30 00:00:00 0
2017-01-31 00:00:00 0