Search code examples
sql-servergroup-bycasequery-optimization

How I can select specific record from a GROUP BY without over-load the process?


I have n record from the table EAT OR DRINK that correspond to some events that can have duplicate IDs.

For every event, I need to group by ID and make in evidence always things that we can eat (GROUP 1) instead things that we can drink (GROUP 2). In synthesis, I will put GROUP 2 string only if there are not string of GROUP 1.

GROUP 1: ('PIZZA',POTATO','EGGS','STRAWBERRY') 
GROUP 2: ('COCA COLA','ORANGE JUICE','BITTER','TEA')

when I GROUP BY ID, strings of the GROUP 1 win ALWAYS between string of GROUP 2. Win it means that I need always to take in a group a string of the GROUP 1 instead of the GROUP 2 but if there's no GROUP 1 string in a group I need to take GROUP 2 string.

After the group by, I chose to use the MAX function and a case when inside putting some conditions.

I did this query and it works, but when I use this in another view the performance are very low. It is strange because when i Run this alone, it works in 4 seconds but inside another view seems very very bad and Slowly and I cannot use in production:

SELECT 
    a.NETWORKID,
    CASE WHEN a.[MOTIVO DESCRIZIONE] IS NOT NULL THEN a.[MOTIVO DESCRIZIONE] ELSE b.[MOTIVO DESCRIZIONE] END AS [MOTIVO_DESCRIZIONE]
FROM
    (SELECT
         a.NETWORKID,
         MAX(CASE WHEN a.[MOTIVO DESCRIZIONE] = 'PIZZA' or a.[MOTIVO DESCRIZIONE] = 'POTATO' or a.[MOTIVO DESCRIZIONE] = 'EGGS' or a.[MOTIVO DESCRIZIONE] = 'STRAWBERRY' THEN a.[MOTIVO DESCRIZIONE] ELSE NULL END) AS [MOTIVO DESCRIZIONE]
     FROM dbo.v_eventi a
     GROUP BY a.NETWORKID, a.[DDT NUMERO]) a
LEFT OUTER JOIN
    (SELECT
         a.NETWORKID,
         MAX(CASE WHEN a.[MOTIVO DESCRIZIONE] = 'COCA COLA' or a.[MOTIVO DESCRIZIONE] = 'ORANGE JUICE' or a.[MOTIVO DESCRIZIONE] = 'BITTER' or a.[MOTIVO DESCRIZIONE] = 'TEA'  THEN a.[MOTIVO DESCRIZIONE] ELSE NULL END) AS [MOTIVO DESCRIZIONE]
     FROM dbo.v_eventi a
     GROUP BY a.NETWORKID, a.[DDT NUMERO]) b ON a.NETWORKID = b.NETWORKID

Could someone help me and have better ideas to make faster this query? or do I need to use a stored procedure that fills a table every day with this transformations?

Thanks to all the kind community in advance.


Solution

  • I see several opportunities for improvement.

    1. In the final select, the CASE statement used to select between the food or the drink can be replaced with a COALESCE() or ISNULL() function. The code would then become ISNULL(a.[MOTIVO DESCRIZIONE], b.[MOTIVO DESCRIZIONE]) AS [MOTIVO_DESCRIZIONE].
    2. The multiple comparisons in each subselect can be replaced with an IN (...) condition. The code would then become WHEN a.[MOTIVO DESCRIZIONE] IN ('PIZZA', 'POTATO', 'EGGS', 'STRAWBERRY') and similar for the drink subselect.
    3. (Very minor) In your conditional aggregation CASE expressions, the ELSE NULL part can be dropped, as that is the default behavior when no explicit ELSE is present.
    4. I see no reason why the two subselects cannot be combined, with both the food and drink calculations performed and combined into a single select. Using ISNULL() in that final select avoids duplicate expressions, that might have been an issue with the original CASE expression use.

    The result would be something like:

    SELECT
        a.NETWORKID,
        -- a.[DDT NUMERO],
        ISNULL(
            MAX(CASE
                WHEN a.[MOTIVO DESCRIZIONE] IN ('PIZZA', 'POTATO', 'EGGS', 'STRAWBERRY')
                THEN a.[MOTIVO DESCRIZIONE]
                END),
            MAX(CASE
                WHEN a.[MOTIVO DESCRIZIONE] IN ('COCA COLA', 'ORANGE JUICE', 'BITTER', 'TEA')
                THEN a.[MOTIVO DESCRIZIONE]
                END)
        ) AS [MOTIVO_DESCRIZIONE]
    FROM dbo.v_eventi a
    GROUP BY a.NETWORKID, a.[DDT NUMERO]