sql date google-bigquery sql-server-2012 range

Dynamically generating date range starts in SQL

Imagine you have a set of dates. You want any date which is within X days of the lowest date to be "merged" into that date. Then you want to repeat until you have merged all date points.

For example:

ID	DatePoints
1	2023-01-01
2	2023-01-02
3	2023-01-12
4	2023-01-21
5	2023-02-01
6	2023-02-02
7	2023-03-01

If you applied this rule to this data using 10 days as your X, you would end up with this output:

DateRangeStarts
2023-01-01
2023-01-12
2023-02-01
2023-03-01

IDs 1 and 2 into range 1, IDs 3 and 4 into range 2, IDs 5 and 6 into range 3, and ID 7 into range 4.

Is there any way to do this without a loop? Answer can work in SQL Server or BigQuery. Thanks

Solution

You could consider something like the following. It's not pretty and I'm not at all confident it is the best solution, but I do think it works. Maybe it's a good starting point for you to work from.

WITH cte AS
(
  SELECT min(datepoint) datepoint
  FROM test
  UNION ALL
  SELECT min(t.datepoint) OVER() datepoint
  FROM test t CROSS APPLY (SELECT max(cte.datepoint) OVER() md FROM cte) c
  WHERE t.datepoint > DATEADD(DAY, 10, c.md)
)
SELECT distinct datepoint
FROM cte
ORDER BY datepoint

(You might want to change the > to a >=, depending on what counts as within X days.)

The basic idea is to get the minimum date from your table into the cte, then recursively get the minimum date from your table that is bigger than the current maximum date in the cte + X days.

It gets messy because of the limitations SQL Server places on recursive CTEs. They can't be used in subqueries, with normal OUTER JOINs, or with aggregate functions. Therefore, I use CROSS APPLY and the window versions of min/max. This gets the correct result, but multiple times, so I'm forced to use DISTINCT to clean it up afterward.

Depending on your data, it might be better to do a loop anyway, but I think this is an option to consider.

Here's a Fiddle of it working.