There is a scheme used in a process to create a delta of changes in a dataset from one day to the next. The query needs to get the overall order of a record within the entire result set of a view, while returning only a subset of records. This, I guess, was going fine until an ugly view pushed the passed the 30 second ado timeout of the application. I thought it would be easy to fix this up, however, it is proving dauntless. The tables that the views utilize have millions of rows, but the indexing is maintained.
Query1 - Return just the 2 keys and row_number against millions of records on a view with a heap of left joins that may be optimized out. (I am having to change this from a select * with a row_number because the * is the problem)
Query2 - Join the big view against a key list of 25 ID's and then join Query1 that has the overall data offset.
Query3 Remove duplicates from joining the two datasets.
DECLARE @KeysToSelect TABLE(Key1 INT)
INSERT INTO @KeysToSelect VALUES(139743),(139878),(139953)
DECLARE @BigView TABLE(Key1 INT, Key2 INT, Area DECIMAL(18,2), BuildingType NVARCHAR(25))
INSERT @BigView VALUES
(100, NULL, 0,''),
(101, NULL, 0,''),
(200, NULL, 0,''),
(201, NULL, 0,''),
(139743, NULL, 8475.00,'Industrial'),
(139743, NULL, 593.00, 'Office'),
(139744, NULL, 0,''),
(139745, NULL, 0,''),
(139746, NULL, 0,''),
(139747, NULL, 0,''),
(139878, NULL, 1268.00,'Office'),
(139878, NULL, 15534.00,'Warehouse'),
(139879, NULL, 0,''),
(139880, NULL, 0,''),
(139881, NULL, 0,''),
(139953, 6, 20000.00,'Warehouse'),
(139953, 14,20000.00,'Office'),
(149956, NULL, 0,''),
(149957, NULL, 0,''),
(149958, NULL, 0,'')
;WITH OveralOrderInData AS
(
SELECT
--!!!! Cant't SELECT * FROM @BigView Here because it hits a 30 second timeout limit. This is what the solve is for.
--It would be easy just to return the data with a rank, however, in this case going into a skinny slice of the data to get overall count
--and joining that against a limited subset of the larger viuew returns in under s second
--Grabbing the 2 keys and ranking takes less than a second, lots of left joins and views calling views
ROW_NUMBER() OVER (ORDER BY vw.Key1 ASC, vw.Key2 ASC) AS OrderInData,
CASE WHEN Keys.Key1 IS NULL THEN NULL ELSE SUM(CASE WHEN Keys.Key1 IS NULL THEN NULL ELSE 1 END) OVER (ORDER BY vw.Key2 ASC,vw.Key1 ASC ROWS UNBOUNDED PRECEDING) END AS MatchedOrderInSearchKeys,
CASE WHEN Keys.Key1 IS NULL THEN 0 ELSE 1 END AS IsMatched,
vw.Key1,
vw.Key2
FROM
@BigView vw
LEFT OUTER JOIN @KeysToSelect Keys ON Keys.Key1 = vw.Key1
)
,Normalized AS
(
--Second dive into the data, this time, we can filter records based on a very selected Key list and join back our overall order data
--this can cause doubles and quadruples. Basically, a count will be associated with each duplicate Key1 and Key 2 value.
--somehow turn these over and rank the duplicates as 1 and 2 and 1 and 2 as opposed to 1 and 1 and 1 and 1 :(
SELECT
O.OrderInData,
O.MatchedOrderInSearchKeys,
O.Key1,
O.Key2,
vw.Area ,
vw.BuildingType,
DENSE_RANK() OVER(PARTITION BY O.Key1 ,O.Key2 ORDER BY OrderInData) AS DistributeOrder,
*
FROM
OveralOrderInData O
INNER JOIN @BigView vw ON vw.Key1 = O.Key1 AND (vw.Key2 = O.Key2 OR O.Key2 IS NULL)
WHERE
O.IsMatched = 1
)
SELECT
*
FROM
Normalized
WHERE
DistributeOrder = 1
The query above returns data in which the first OrderInData found for duplicates of Key1 and Key2 is realized, see below:
OrderInData | Key1 | Key2 | Area | BuildingType |
---|---|---|---|---|
5 | 139743 | NULL | 8475.00 | Industrial |
5 | 139743 | NULL | 593.00 | Office |
11 | 139878 | NULL | 1268.00 | Office |
11 | 139878 | NULL | 15534.00 | Warehouse |
16 | 139953 | 6 | 20000.00 | Warehouse |
17 | 139953 | 14 | 20000.00 | Office |
Desired Output
OrderInData | Key1 | Key2 | Area | BuildingType |
---|---|---|---|---|
5 | 139743 | NULL | 8475.00 | Industrial |
6 | 139743 | NULL | 593.00 | Office |
11 | 139878 | NULL | 1268.00 | Office |
12 | 139878 | NULL | 15534.00 | Warehouse |
16 | 139953 | 6 | 20000.00 | Warehouse |
17 | 139953 | 14 | 20000.00 | Office |
My next move would be to create virtualized tables in subqueries containing the distinct orders against items in keys and join again, however, that would take another two subqueries, and this was supposed to be a quick fix. Is there a silver bullet that I am just too daft to see at this point?
Without trying to reverse engineer anything before it, the final select can just have an expression sourced in row number:
SELECT
RealOrderInData = OrderInData - 1
+ ROW_NUMBER() OVER (PARTITION BY MatchedOrderInSearchKeys
ORDER BY BuildingType), -- should pick something more deterministic
*
FROM
Normalized
WHERE
DistributeOrder = 1;
It's quite possible the rest of the query could be consolidated / simplified, but that's a tall order at night. :-)