I wonder why there is no First/Last Aggregate function in T-SQL? It's something I have to do so frequently, and I have tried several workarounds with windowed LAST_VALUE() and FIRST_VALUE(), with subqueries, with WITH statement views, but I feel like there has got to be a best way to do this?
Example Data and Workarounds tbl
id | group | order | fact |
---|---|---|---|
01 | 1 | 1 | 101 |
02 | 1 | 2 | 105 |
03 | 2 | 1 | 103 |
... | ... | ... | ... |
SELECT DISTINCT group
,LAST_VALUE(fact)
OVER(PARTITION BY group
ORDERBY order
RANGE BETWEEN UNBOUNDED PRECEDING
AND UNBOUNDED FOLLOWING
) AS LastFact
from tbl
This works fine unless I want to include the id or order columns. The order is usually a datetime, and I usually do want to include it, so then I have to join this table back to itself.
WITH cte AS (
SELECT group
,MAX(order)
FROM tbl
GROUP BY group
)
SELECT t.id
,t.group
,t.order
,t.fact
FROM tbl t
JOIN cte c
ON t.group = c.group AND t.order = c.order
WHERE c.group IS NOT NULL and c.order IS NOT NULL
This is also fine, but when I'm working with large queries with lots of WITH statement views I run into realy long runtimes.
Similarly
SELECT t.ID
,t.group
,t.order
,t.fact
FROM tbl t
INNER JOIN (
SELECT group
,MAX(order) as LastOrder
FROM tbl
GROUP BY group) c
ON t.group = c.group and t.order = c.order
this one is fine, but my readability goes south real fast when I have six of these in a query. And because of the organization I'm working in, readability is important. I also seem to recall an instructor at some point advising against subqueries when it can be avoided. Is that true?
My last resort is using a select statement inside the select statement, but this universally results in long runtimes, and they are complicated, and difficult to make work. I couldn't even make an example query work for the purposes of this post.
Are there any workarounds?
Is there really no First/Last aggregate function in T-SQL?
No - there are no such aggregate functions in TSQL. The problem with FIRST
and LAST
is that you need a way of specifying that the rows going into the aggregate should be treated as having some sort of order.
There are aggregate functions of this name implemented in Access but the documentation there indicates that sometimes "the records returned by these functions will be arbitrary" and I'm not an Access developer so not sure what criteria need to be met to be assured that they behave deterministically.
Until fairly recently aggregates in SQL Server have not supported any mechanism for declaring an ordering.
But both APPROX_PERCENTILE_CONT
and STRING_AGG
are TSQL "ordered set" aggregate functions that do allow an order to be specified.
So an aggregate
LAST(city) WITHIN GROUP (order by population)
Could be added. But hasn't been. Itzik Ben Gan discusses these here and specifically calls out FIRST_VALUE
and LAST_VALUE
as being potential candidates to make available as aggregate functions as well as analytic ones.
Or potentially the Oracle approach could be adopted. Which I believe is something like.
max(city) keep (dense_rank first order by population desc)
Though from the response to this feedback item requesting the Oracle approach this option definitely looks unlikely.
Regarding the first option I could find this feature request - currently at 4 votes.