Imagine you have a set of dates. You want any date which is within X days of the lowest date to be "merged" into that date. Then you want to repeat until you have merged all date points.
For example:
ID | DatePoints |
---|---|
1 | 2023-01-01 |
2 | 2023-01-02 |
3 | 2023-01-12 |
4 | 2023-01-21 |
5 | 2023-02-01 |
6 | 2023-02-02 |
7 | 2023-03-01 |
If you applied this rule to this data using 10 days as your X, you would end up with this output:
DateRangeStarts |
---|
2023-01-01 |
2023-01-12 |
2023-02-01 |
2023-03-01 |
IDs 1 and 2 into range 1, IDs 3 and 4 into range 2, IDs 5 and 6 into range 3, and ID 7 into range 4.
Is there any way to do this without a loop? Answer can work in SQL Server or BigQuery. Thanks
You could consider something like the following. It's not pretty and I'm not at all confident it is the best solution, but I do think it works. Maybe it's a good starting point for you to work from.
WITH cte AS
(
SELECT min(datepoint) datepoint
FROM test
UNION ALL
SELECT min(t.datepoint) OVER() datepoint
FROM test t CROSS APPLY (SELECT max(cte.datepoint) OVER() md FROM cte) c
WHERE t.datepoint > DATEADD(DAY, 10, c.md)
)
SELECT distinct datepoint
FROM cte
ORDER BY datepoint
(You might want to change the >
to a >=
, depending on what counts as within X days.)
The basic idea is to get the minimum date from your table into the cte, then recursively get the minimum date from your table that is bigger than the current maximum date in the cte + X days.
It gets messy because of the limitations SQL Server places on recursive CTEs. They can't be used in subqueries, with normal OUTER JOIN
s, or with aggregate functions. Therefore, I use CROSS APPLY
and the window versions of min
/max
. This gets the correct result, but multiple times, so I'm forced to use DISTINCT
to clean it up afterward.
Depending on your data, it might be better to do a loop anyway, but I think this is an option to consider.
Here's a Fiddle of it working.