Search code examples
sqlsql-servert-sqlquery-optimization

How to optimize sql multiple select queries?


I have a query that calculates wait time for each record from the Transactions table and calculates SUM and MAX wait time for each of the group below based on receivedDate and claimedDt. Here is a basic query:

SELECT
    '2022-06-01' as reportDate,
    waitTimeSubQuery.currentAssignedQueueId,
    waitTimeSubQuery.queueAccessPointId,
    waitTimeSubQuery.queueName AS QueueName,
    waitTimeSubQuery.queueReportCategory,
    waitTimeSubQuery.queuePriority,
    waitTimeSubQuery.queueOrganizationHierarchyId,
    COUNT(waitTimeSubQuery.id) AS totalCasesWaiting,
    SUM(CASE WHEN waitTimeSubQuery.waitTimeMinutes > 0 THEN waitTimeSubQuery.waitTimeMinutes ELSE 0 END) AS sumWaitTimeMinutes,
    MAX(CASE WHEN waitTimeSubQuery.waitTimeMinutes > 0 THEN waitTimeSubQuery.waitTimeMinutes ELSE 0 END) AS maxWaitTimeMinutes
FROM (SELECT
    id,
    currentAssignedQueueId,
    queueAccessPointId,
    queueName,
    queueReportCategory,
    queuePriority,
    queueOrganizationHierarchyId,
(CASE WHEN (receivedDateUTC > '0001-01-01T00:00:00Z' OR receivedDate > '0001-01-01T00:00:00Z') AND (appointmentDT IS NULL OR appointmentDT < '2022-09-28T12:58:47')

THEN CAST(DateDiff(MINUTE,
    CASE WHEN receivedDateUTC > '0001-01-01T00:00:00Z'
    THEN CONCAT(SUBSTRING(CAST(receivedDateUTC AS VARCHAR), 0, 19), 'Z')
    ELSE TRY_CAST(CONCAT(CONCAT(SUBSTRING(CAST(receivedDate AS VARCHAR), 0, 10), SUBSTRING(CAST(createdDT AS VARCHAR), 10, 9)), 'Z') AS DATETIME2) END,
    CASE WHEN claimedDT > '0001-01-01T00:00:00Z' AND claimedDT < '2022-06-01T23:59:00Z' AND transactionStatus != 'WaitingAssignment'
    THEN CONCAT(SUBSTRING(CAST(claimedDT AS VARCHAR), 0, 19), 'Z')
    ELSE '2022-06-01T23:59:00Z' END
) AS BIGINT)
ELSE 0 END) AS waitTimeMinutes
FROM @transactionsList
WHERE receivedDate <= '2022-06-01T00:00:00'
AND (claimedDT >= '2022-06-02T00:00:00' OR transactionStatus = 'WaitingAssignment')) waitTimeSubQuery
GROUP BY waitTimeSubQuery.currentAssignedQueueId, 
    waitTimeSubQuery.queueAccessPointId, 
    waitTimeSubQuery.queueName, 
    waitTimeSubQuery.queueReportCategory, 
    waitTimeSubQuery.queuePriority, 
    waitTimeSubQuery.queueOrganizationHierarchyId

I want to calculate statistics for each day for the period of 30 days and the only difference is dates used for calculating waitTimeMinutes (based on endDate, claimedDt, receivedDate) and filtering by receivedDate and claimedDt.

I tried to save sub-set of the Transactions data into a table variable and reuse it in multiple select queries to get the statistics for each day, but this script runs too slow

Here is the code:

DECLARE @transactionsList TABLE (
    id UNIQUEIDENTIFIER, 
    currentAssignedQueueId UNIQUEIDENTIFIER, 
    queueAccessPointId UNIQUEIDENTIFIER, 
    queueName VARCHAR(100), 
    queueReportCategory VARCHAR(100), 
    queuePriority INT, 
    queueOrganizationHierarchyId UNIQUEIDENTIFIER,
    receivedDate DATE,
    claimedDT DATE,
    transactionStatus VARCHAR(100),
    receivedDateUTC DATE,
    appointmentDT DATE,
    createdDT DATE)


INSERT INTO @transactionsList
SELECT
    Transactions.id,
    Transactions.currentAssignedQueueId, 
    Queues.accessPointId as queueAccessPointId,
    Queues.name as queueName,
    Queues.reportCategory as queueReportCategory,
    Queues.priority as queuePriority,
    Queues.organizationHierarchyId as queueOrganizationHierarchyId,
    Transactions.receivedDate,
    Transactions.claimedDT,
    Transactions.transactionStatus,
    Transactions.receivedDateUTC,
    Transactions.appointmentDT,
    Transactions.createdDT
FROM Transactions 
LEFT JOIN Queues ON Transactions.currentAssignedQueueId = Queues.Id
WHERE Transactions.consumerId = '66458f4a-b3d4-4f80-93d4-5aa3ea123249'
    AND Transactions.isActive = 1
    AND Transactions.receivedDate <= '2022-06-30T00:00:00'
    AND (Transactions.claimedDT >= '2022-06-02T00:00:00' OR Transactions.transactionStatus = 'WaitingAssignment')


--SELECT COUNT(*) FROM @transactionsList

-- 2022-06-01
SELECT
    '2022-06-01' as reportDate,
    waitTimeSubQuery.currentAssignedQueueId,
    waitTimeSubQuery.queueAccessPointId,
    waitTimeSubQuery.queueName AS QueueName,
    waitTimeSubQuery.queueReportCategory,
    waitTimeSubQuery.queuePriority,
    waitTimeSubQuery.queueOrganizationHierarchyId,
    COUNT(waitTimeSubQuery.id) AS totalCasesWaiting,
    SUM(CASE WHEN waitTimeSubQuery.waitTimeMinutes > 0 THEN waitTimeSubQuery.waitTimeMinutes ELSE 0 END) AS sumWaitTimeMinutes,
    MAX(CASE WHEN waitTimeSubQuery.waitTimeMinutes > 0 THEN waitTimeSubQuery.waitTimeMinutes ELSE 0 END) AS maxWaitTimeMinutes
FROM (SELECT
    id,
    currentAssignedQueueId,
    queueAccessPointId,
    queueName,
    queueReportCategory,
    queuePriority,
    queueOrganizationHierarchyId,
(CASE WHEN (receivedDateUTC > '0001-01-01T00:00:00Z' OR receivedDate > '0001-01-01T00:00:00Z') AND (appointmentDT IS NULL OR appointmentDT < '2022-09-28T12:58:47')

THEN CAST(DateDiff(MINUTE,
    CASE WHEN receivedDateUTC > '0001-01-01T00:00:00Z'
    THEN CONCAT(SUBSTRING(CAST(receivedDateUTC AS VARCHAR), 0, 19), 'Z')
    ELSE TRY_CAST(CONCAT(CONCAT(SUBSTRING(CAST(receivedDate AS VARCHAR), 0, 10), SUBSTRING(CAST(createdDT AS VARCHAR), 10, 9)), 'Z') AS DATETIME2) END,
    CASE WHEN claimedDT > '0001-01-01T00:00:00Z' AND claimedDT < '2022-06-01T23:59:00Z' AND transactionStatus != 'WaitingAssignment'
    THEN CONCAT(SUBSTRING(CAST(claimedDT AS VARCHAR), 0, 19), 'Z')
    ELSE '2022-06-01T23:59:00Z' END
) AS BIGINT)
ELSE 0 END) AS waitTimeMinutes
FROM @transactionsList
WHERE receivedDate <= '2022-06-01T00:00:00'
AND (claimedDT >= '2022-06-02T00:00:00' OR transactionStatus = 'WaitingAssignment')) waitTimeSubQuery
GROUP BY waitTimeSubQuery.currentAssignedQueueId, 
    waitTimeSubQuery.queueAccessPointId, 
    waitTimeSubQuery.queueName, 
    waitTimeSubQuery.queueReportCategory, 
    waitTimeSubQuery.queuePriority, 
    waitTimeSubQuery.queueOrganizationHierarchyId

And the part from '2022-06-01' repeats 30 times for each day from the range respectively.

Is there any possibility to optimize this script? For my data it runs nearly 49 seconds and the number of the records in the table is only 7kk and it's far way more on other environment.

---EDIT---

Here are the definitions for the tables Transactions and Queues

CREATE TABLE [dbo].[Transactions] (
    [id]                            UNIQUEIDENTIFIER   NULL,
    [accessPointId]                 UNIQUEIDENTIFIER   NULL,
    [isNonLobby]                    BIT                NULL,
    [printReceipt]                  BIT                NULL,
    [isExpressed]                   BIT                NULL,
    [isAssignedBySupervisor]        BIT                NULL,
    [dateAssignedBySupervisor]      VARCHAR (30)       NULL,
    [dateAssignedBySupervisorLocal] DATETIMEOFFSET (7) NULL,
    [supervisorOverrideId]          UNIQUEIDENTIFIER   NULL,
    [supervisorId]                  UNIQUEIDENTIFIER   NULL,
    [isNewHousehold]                BIT                NULL,
    [householdId]                   UNIQUEIDENTIFIER   NULL,
    [receivedDate]                  DATE               NULL,
    [receivedDateUTC]               DATETIME2 (7)      NULL,
    [transactionStatus]             VARCHAR (20)       NULL,
    [claimedDT]                     DATETIME2 (7)      NULL,
    [claimedDTLocal]                DATE               NULL,
    [initiallyClaimedById]          UNIQUEIDENTIFIER   NULL,
    [completedDT]                   DATETIME2 (7)      NULL,
    [appointmentDT]                 DATETIME2 (7)      NULL,
    [completedDTLocal]              DATE               NULL,
    [completedById]                 UNIQUEIDENTIFIER   NULL,
    [remarks]                       VARCHAR (MAX)      NULL,
    [currentAssignedQueueId]        UNIQUEIDENTIFIER   NULL,
    [currentAssignedUserId]         UNIQUEIDENTIFIER   NULL,
    [preTriageTransactionId]        UNIQUEIDENTIFIER   NULL,
    [isAddClaim]                    BIT                NULL,
    [receiptId]                     UNIQUEIDENTIFIER   NULL,
    [ticketNumber]                  VARCHAR (100)      NULL,
    [deliNumber]                    VARCHAR (100)      NULL,
    [tasksCount]                    BIGINT             NULL,
    [projectedWaitTime]             BIGINT             NULL,
    [actualWaitTimeMinutes]         BIGINT             NULL,
    [assignWaitTime]                BIGINT             NULL,
    [transactionTimeMinutes]        BIGINT             NULL,
    [triageTimeMinutes]             BIGINT             NULL,
    [outcomeId]                     UNIQUEIDENTIFIER   NULL,
    [outcomeReasonId]               UNIQUEIDENTIFIER   NULL,
    [outcomeType]                   VARCHAR (50)       NULL,
    [dueDate]                       VARCHAR (30)       NULL,
    [dueDateLocal]                  DATETIMEOFFSET (7) NULL,
    [isDueDateToday]                VARCHAR (30)       NULL,
    [isUnknownHousehold]            BIT                NULL,
    [activityId]                    UNIQUEIDENTIFIER   NULL,
    [programId]                     UNIQUEIDENTIFIER   NULL,
    [reviewMonthDueDate]            VARCHAR (40)       NULL,
    [reviewMonthDueDateLocal]       DATETIMEOFFSET (7) NULL,
    [officeId]                      UNIQUEIDENTIFIER   NULL,
    [isOnceManuallyPaused]          BIT                NULL,
    [pkey]                          VARCHAR (100)      NULL,
    [isActive]                      BIT                NULL,
    [consumerId]                    UNIQUEIDENTIFIER   NULL,
    [organizationName]              VARCHAR (50)       NULL,
    [modifiedBy]                    UNIQUEIDENTIFIER   NULL,
    [modifiedDT]                    DATETIME2 (7)      NULL,
    [modifiedDTLocal]               DATE               NULL,
    [createdBy]                     UNIQUEIDENTIFIER   NULL,
    [createdDT]                     DATETIME2 (7)      NULL,
    [createdDTLocal]                DATE               NULL,
    [_ts]                           BIGINT             NULL,
    [type]                          VARCHAR (50)       NULL,
    [timezoneId]                    VARCHAR (50)       NULL
);



CREATE TABLE [dbo].[Queues] (
    [id]                                 UNIQUEIDENTIFIER NULL,
    [name]                               VARCHAR (255)    NULL,
    [accessPointId]                      UNIQUEIDENTIFIER NULL,
    [organizationHierarchyId]            UNIQUEIDENTIFIER NULL,
    [assignedOrganizationHierarchyLevel] VARCHAR (20)     NULL,
    [bundlingGroup]                      BIGINT           NULL,
    [reportCategory]                     VARCHAR (100)    NULL,
    [priority]                           INT              NULL,
    [businessProcessThreshold]           BIGINT           NULL,
    [calculateProjectedWaitTime]         BIT              NULL,
    [waitTimeUnits]                      VARCHAR (10)     NULL,
    [hasCarryOver]                       BIT              NULL,
    [defaultTransactionTimeMinutes]      BIGINT           NULL,
    [latestQueueMetricId]                UNIQUEIDENTIFIER NULL,
    [isAppointment]                      BIT              NULL,
    [consumerId]                         UNIQUEIDENTIFIER NULL,
    [organizationName]                   VARCHAR (50)     NULL,
    [modifiedBy]                         UNIQUEIDENTIFIER NULL,
    [modifiedDT]                         DATE             NULL,
    [createdBy]                          UNIQUEIDENTIFIER NULL,
    [createdDT]                          DATE             NULL,
    [_ts]                                BIGINT           NULL
);

And an index for the Transactions table (this is the only one index for this table and there are no indexes for the Queues table):

CREATE NONCLUSTERED INDEX [nci_wi_Transactions_EBAFDE3A7C2969265E76135FBA69188D] ON [dbo].[Transactions]
(
    [consumerId] ASC,
    [isActive] ASC,
    [receivedDate] ASC
)
INCLUDE([appointmentDT],[claimedDT],[createdDT],[currentAssignedQueueId],[id],[receivedDateUTC],[transactionStatus]) WITH (STATISTICS_NORECOMPUTE = OFF, DROP_EXISTING = OFF, ONLINE = OFF, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
GO

Here is the link on GitHub for the execution plan and the full query
https://github.com/sskotliar/query_optimization

PasteThePlan link


Solution

  • Option - 1

    You can improve the performance of your query by using a CTE. How does CTE work? The query you write inside the WITH command is executed only once, and its result is stored as a temporary table. This allows you to execute your large query with multiple join in it once. In the remaining cases, only its result is used. But as I understood you call your query with different filters. If this is true, so please see Option 2.

    Option - 2

    If the result of your repeatable query is large, and you call this query with different filters, it is recommended that you, insert result of repeatable query into temp table, then create needed indexes for this temp table. Then you can use that table as much as you need. Since your table has indexes so your filters will be works high performance.

    For each of the options mentioned above, it is recommended that you view Query Plan. Because the query plan may change from the DB side depending on the count of records in the tables.