I have a transaction table with 65,000,000 rows.
it's divided into about a couple hundred Companies, and each Company is divided into a couple hundred or thousand unique Users.
Example:
CREATE TABLE [dbo].[TestTransaction]
(
[Id] [int] IDENTITY(1,1) NOT NULL,
[CompanyId] [int] NULL,
[UserId] [int] NULL,
[LogDateTime] [datetime] NULL,
[Key1] [nvarchar](20) NULL,
) ON [PRIMARY]
GO
For a specific company I need to get all users and the key value at the user's minimum LogDateTime
and at the user's maximum LogDateTime
.
I want to do it this way:
;with cteGetMin as
(
select
CompanyId
, UserId
, LogDateTime
, Key1
, Row_Number() over (partition by UserId order by LogDateTime) as rowNumMin
from [dbo].[TestTransaction]
where CompanyId = @companyId
)
,cteGetMax as
(
select
CompanyId
, UserId
, LogDateTime
, Key1
, Row_Number() over (partition by UserId order by LogDateTime desc) as rowNumMax
from [dbo].[TestTransaction]
where CompanyId = @companyId
)
select
ma.CompanyId
,ma.UserId
,ma.Key1 as MaxKey
,ma.LogDateTime as MaxLogDateTime
,mi.Key1 as MinKey
,mi.LogDateTime as MinLogDateTime
from cteGetMax ma
join cteGetMin mi
on mi.CompanyId = ma.CompanyId
and mi.userId = ma.userId
and mi.rowNumMin = 1
and ma.rowNumMax = 1
I could break out each CTE as a separate view (or indexed view?) but the views would be operating on the entire table of 65,000,000 rows.
My first thought is that adding the WHERE
clause to each CTE will limit the sub results thus improving the performance.
Any thoughts? Should I keep the CTE's, or go with the separate views?
EDIT: The point I was trying to make is, is it more efficient to have a limiting WHERE clause in a sub-query or in a CTE, or is it more efficient to have a view that operated over the entire 65,000,000 rows?
You don't really need two subqueries and a join. Instead, you can enumerate the rows both ways with row_number()
, and then filter once, and do conditional aggregation:
select CompanyId, UserId,
max(case when rowNumMax = 1 then Key1 end) as MaxKey,
max(LogDateTime) as MaxLogDateTime,
min(case when rowNumMin = 1 then Key1 end) as MinKey,
min(LogDateTime) as MinLogDateTime
from (
select t.*,
row_number() over (partition by UserId order by LogDateTime) as rowNumMin,
row_number() over (partition by UserId order by LogDateTime desc) as rowNumMax
from [dbo].[TestTransaction] t
where CompanyId = @companyId
) t
where 1 in (rowNumMin, rowNumMax)
group by CompanyId, UserId