Search code examples
sql-servert-sqlgroupingaggregate-functions

Is there really no First/Last aggregate function in T-SQL?


I wonder why there is no First/Last Aggregate function in T-SQL? It's something I have to do so frequently, and I have tried several workarounds with windowed LAST_VALUE() and FIRST_VALUE(), with subqueries, with WITH statement views, but I feel like there has got to be a best way to do this?

Example Data and Workarounds tbl

id group order fact
01 1 1 101
02 1 2 105
03 2 1 103
... ... ... ...
SELECT DISTINCT group
                ,LAST_VALUE(fact) 
                 OVER(PARTITION BY group 
                      ORDERBY order 
                      RANGE BETWEEN UNBOUNDED PRECEDING 
                        AND UNBOUNDED FOLLOWING
                  ) AS LastFact

from tbl

This works fine unless I want to include the id or order columns. The order is usually a datetime, and I usually do want to include it, so then I have to join this table back to itself.

WITH cte AS (
SELECT group
       ,MAX(order)
FROM tbl
GROUP BY group
)
SELECT t.id
       ,t.group
       ,t.order
       ,t.fact
FROM tbl t
JOIN cte c
ON   t.group = c.group AND t.order = c.order
WHERE c.group IS NOT NULL and c.order IS NOT NULL

This is also fine, but when I'm working with large queries with lots of WITH statement views I run into realy long runtimes.

Similarly

SELECT t.ID
        ,t.group
        ,t.order
        ,t.fact
FROM    tbl t
INNER JOIN (
SELECT group
       ,MAX(order) as LastOrder
FROM tbl 
GROUP BY group) c 
ON t.group = c.group and t.order = c.order

this one is fine, but my readability goes south real fast when I have six of these in a query. And because of the organization I'm working in, readability is important. I also seem to recall an instructor at some point advising against subqueries when it can be avoided. Is that true?

My last resort is using a select statement inside the select statement, but this universally results in long runtimes, and they are complicated, and difficult to make work. I couldn't even make an example query work for the purposes of this post.

Are there any workarounds?


Solution

  • Is there really no First/Last aggregate function in T-SQL?

    No - there are no such aggregate functions in TSQL. The problem with FIRST and LAST is that you need a way of specifying that the rows going into the aggregate should be treated as having some sort of order.

    There are aggregate functions of this name implemented in Access but the documentation there indicates that sometimes "the records returned by these functions will be arbitrary" and I'm not an Access developer so not sure what criteria need to be met to be assured that they behave deterministically.

    Until fairly recently aggregates in SQL Server have not supported any mechanism for declaring an ordering.

    But both APPROX_PERCENTILE_CONT and STRING_AGG are TSQL "ordered set" aggregate functions that do allow an order to be specified.

    So an aggregate

    LAST(city) WITHIN GROUP (order by population)
    

    Could be added. But hasn't been. Itzik Ben Gan discusses these here and specifically calls out FIRST_VALUE and LAST_VALUE as being potential candidates to make available as aggregate functions as well as analytic ones.

    Or potentially the Oracle approach could be adopted. Which I believe is something like.

    max(city) keep (dense_rank first order by population desc)
    

    Though from the response to this feedback item requesting the Oracle approach this option definitely looks unlikely.

    Regarding the first option I could find this feature request - currently at 4 votes.