In this scenario is the SELECT TOP necessary within the CTE (micro optimization, I know...).
DECLARE @pageSize INT, @currentPage INT, @top INT
SET @pageSize = 10
SET @currentPage = 150
SET @top = @pageSize * @currentPage + @pageSize
WITH t AS
(
SELECT TOP(***@top***) ID, name
ROW_NUMBER() OVER (ORDER BY ID) AS _row,
FROM dbo.User
)
SELECT TOP(@pageSize) *
FROM t
WHERE t._row > (@top-@pageSize)
ORDER BY t.ID
The above returns 10 (@pageSize) rows from a start number (@top-@pageSize) in a specific order with a row number column. Does the CTE statement know that the "SELECT TOP" outside of the CTE and the WHERE-clause, also outside the CTE, is to come, hence the CTE never returns more rows in the specific order than needed?
Basically just talking about the ROW_NUMBER function, that it does not count a row number for rows not returned (if I were to have millions of rows...), and also if I were to select top 100 in the CTE, would row_number still be calculated for all the million rows within the table selected?
I have tried with and without "SELECT TOP(@top)" in the CTE-statement, inside a loop with 10.000 runs, without seeing any difference in time usage. Though, I have only 38.000 rows in the table at the moment.
Edit: So the result:
WITH t AS
(
**DO A TOP() WITH AN ORDER BY IN THE CTE**
SELECT TOP(@top) ID, name
ROW_NUMBER() OVER (ORDER BY ID) AS _row,
FROM dbo.User
ORDER BY ID
)
SELECT TOP(@pageSize) *
**SELECTING TOP N FROM THE CTE, WHERE ROW-NUMBER IS ... DUE TO THE CTE IS IN ORDER ALREADY**
FROM t
WHERE t._row > (@top-@pageSize)
This could probably be more efficient if I ORDERED them "backwards", selecting the "bottom @pageSize" of the CTE, which would leave out the where-clause... but that would require some test if it actually were faster...
The use of top
without an order by
is discouraged. There is no guarantee that you will get the rows that you want, so you should not include the top
. Or, you should include an order by id
, if that is the ordering that you want.
The user of top
doesn't affect the row_number()
calculation, because that calculation is going to be done before the top
is applied. You can imagine having another window function there, such as sum() over ()
to understand that the top
cannot generally be applied before the row_number()
and finding the circumstances where it is safe would be hard work.