I want to write my own SQL scalar function that calculates the mean between data points of a single column over a variable number of rows, based on a second column. I envision this function looking something like WINDOWMEAN(data, mileage, 100)
. This would compute the mean of all data
over the rows where the mileage
is within 100 miles of that of the current row.
Example:
| data | mileage |
| 10 | 1000 |
| 15 | 1009 |
| 20 | 1056 |
| 16 | 1098 |
| 13 | 1130 |
| 14 | 1200 |
With a query like SELECT WINDOWMEAN(data, mileage, 100) AS a FROM t
, I would expect:
| data | mileage | a |
| 10 | 1000 | 10 |
| 15 | 1009 | 12.5 |
| 20 | 1056 | 15 |
| 16 | 1098 | 15.25|
| 13 | 1130 | 16.33|
| 14 | 1200 | 13.5 |
a
is calculated for each row as the mean of all rows within 100 miles of mileage
that come before the current row.
Where I am getting stuck is how to access previous rows in a custom SQL function. I am not certain what I am trying to accomplish is even possible because I've yet to find documentation for accessing other rows in this way.
Whether specific to this solution or not, is there any way to access previous rows in a custom SQL scalar function?
(I can make the assumption that the rows are ordered by mileage)
You can use a window function with the appropriate RANGE
frame definition to limit the window to rows within 100 miles of the current one:
SELECT data, mileage
, avg(data) OVER (ORDER BY mileage RANGE BETWEEN 100 PRECEDING AND CURRENT ROW) AS a
FROM t
ORDER BY mileage;
data mileage a
---------- ---------- ----------
10 1000 10.0
15 1009 12.5
20 1056 15.0
16 1098 15.25
13 1130 16.3333333
14 1200 13.5
Notes:
This query requires Sqlite 3.28 or newer, as that version had some major enhancements to window functions to allow, among other things, numeric ranges like this.
For best results, create an index on t(mileage)
or a covering index on t(mileage, data)
.
Non-window function version using a correlated subquery (Also works better with that index):
SELECT data, mileage
, (SELECT avg(t2.data) FROM t AS t2
WHERE t2.mileage BETWEEN t1.mileage - 100 AND t1.mileage) AS a
FROM t AS t1
ORDER BY mileage;