Search code examples
sqlsql-serverdatetimegreatest-n-per-groupwindow-functions

SQL CTE vs View


I have a transaction table with 65,000,000 rows.

it's divided into about a couple hundred Companies, and each Company is divided into a couple hundred or thousand unique Users.

Example:

CREATE TABLE [dbo].[TestTransaction]
(
    [Id] [int] IDENTITY(1,1) NOT NULL,
    [CompanyId] [int] NULL,
    [UserId] [int] NULL,
    [LogDateTime] [datetime] NULL,
    [Key1] [nvarchar](20) NULL,
) ON [PRIMARY]
GO

For a specific company I need to get all users and the key value at the user's minimum LogDateTime and at the user's maximum LogDateTime.

I want to do it this way:

;with cteGetMin as
(
    select 
        CompanyId
        , UserId
        , LogDateTime
        , Key1
        , Row_Number() over (partition by UserId order by LogDateTime) as rowNumMin
    from [dbo].[TestTransaction]
    where CompanyId = @companyId
)
,cteGetMax as
(
    select 
        CompanyId
        , UserId
        , LogDateTime
        , Key1
        , Row_Number() over (partition by UserId order by LogDateTime desc) as rowNumMax
    from [dbo].[TestTransaction]
    where CompanyId = @companyId
)
select 
    ma.CompanyId
    ,ma.UserId
    ,ma.Key1 as MaxKey
    ,ma.LogDateTime as MaxLogDateTime
    ,mi.Key1 as MinKey
    ,mi.LogDateTime as MinLogDateTime
from cteGetMax ma
join cteGetMin mi
    on mi.CompanyId = ma.CompanyId
    and mi.userId = ma.userId
    and mi.rowNumMin = 1
    and ma.rowNumMax = 1

I could break out each CTE as a separate view (or indexed view?) but the views would be operating on the entire table of 65,000,000 rows.

My first thought is that adding the WHERE clause to each CTE will limit the sub results thus improving the performance.

Any thoughts? Should I keep the CTE's, or go with the separate views?

EDIT: The point I was trying to make is, is it more efficient to have a limiting WHERE clause in a sub-query or in a CTE, or is it more efficient to have a view that operated over the entire 65,000,000 rows?


Solution

  • You don't really need two subqueries and a join. Instead, you can enumerate the rows both ways with row_number(), and then filter once, and do conditional aggregation:

    select CompanyId, UserId,
        max(case when rowNumMax = 1 then Key1 end) as MaxKey,
        max(LogDateTime)                           as MaxLogDateTime,
        min(case when rowNumMin = 1 then Key1 end) as MinKey,
        min(LogDateTime)                           as MinLogDateTime
    from (
        select t.*, 
            row_number() over (partition by UserId order by LogDateTime) as rowNumMin,
            row_number() over (partition by UserId order by LogDateTime desc) as rowNumMax
        from [dbo].[TestTransaction] t
        where CompanyId = @companyId
    ) t
    where 1 in (rowNumMin, rowNumMax)
    group by CompanyId, UserId