Search code examples
sqlsql-server-2008ssrs-2008

Struggling to find the right WHERE clause


I'm struggling with a SQL query and I need your help. To be honest, I'm starting to wonder if what I want to achieve can be done the way I did it so far but maybe your collective brains can come up with a better solution than mine and prove me I took the good way at the beginning (Or that I was totally wrong and I should start from scratch).

The Dataset

A row has 4 important fields: ItemID, Item, Priority and Group. Those fields contain the only valuable piece of information, the one that will be displayed in the end.

As I'm using SQL Server 2008, I don't have access to the LAG and LEAD function so I needed to simulate them (Or at least, I did it because I thought it would be useful to me but I'm not so sure anymore). To obtain this result, I used the code from this article from SQLscope which provide you with a LAG and LEAD equivalent that I restrict to a set of row that have the same ItemID. This adds 7 new functional columns to my dataset: Rn, RnDiv2, RnPlus1Div2, PreviousPriority, NextPriority, PreviousGroup and NextGroup.

ItemID      | Item      | Priority  | Group     | Rn        | RnDiv2    | RnPlus1Div2   | PreviousPriority  | NextPriority  | PreviousGroup     | NextGroup 
--------    | -------   | --------  | -------   | -----     | ------    | -----------   | ----------------  | ------------  | -------------     | --------- 
16777397    | Item 1    | 5         | Group 1   | 1         | 0         | 1             | NULL              | NULL          | NULL              | NULL      
16777403    | Item 2    | 5         | Group 2   | 1         | 0         | 1             | NULL              | 5             | NULL              | Group 2   
16777403    | Item 2    | 10        | Group 2   | 2         | 1         | 1             | 5                 | NULL          | Group 2           | NULL      
16777429    | Item 3    | 1000      | Group 3   | 1         | 0         | 1             | NULL              | NULL          | NULL              | NULL      
16777430    | Item 4    | 5         | Group 1   | 1         | 0         | 1             | NULL              | NULL          | NULL              | NULL      
16777454    | Item 5    | 5         | Group 4   | 1         | 0         | 1             | NULL              | NULL          | NULL              | NULL      
16777455    | Item 6    | 5         | Group 5   | 1         | 0         | 1             | NULL              | NULL          | NULL              | NULL      
16777459    | Item 6    | 5         | Group 6   | 1         | 0         | 1             | NULL              | NULL          | NULL              | NULL      
16777468    | Item 8    | 5         | Group 7   | 1         | 0         | 1             | NULL              | NULL          | NULL              | NULL      
16777479    | Item 9    | 5         | Group 4   | 1         | 0         | 1             | NULL              | NULL          | NULL              | NULL      
16777481    | Item 10   | 5         | Group 4   | 1         | 0         | 1             | NULL              | NULL          | NULL              | NULL      
16777496    | Item 11   | 5         | Group 6   | 1         | 0         | 1             | NULL              | NULL          | NULL              | NULL      
16777514    | Item 12   | 5         | Group 4   | 1         | 0         | 1             | NULL              | NULL          | NULL              | NULL      
16777518    | Item 13   | 5         | Group 8   | 1         | 0         | 1             | NULL              | 10            | NULL              | Group 8   
16777518    | Item 13   | 10        | Group 8   | 2         | 1         | 1             | 5                 | 100           | Group 8           | Group 1   
16777518    | Item 13   | 100       | Group 1   | 3         | 1         | 2             | 10                | NULL          | Group 8           | NULL      
16777520    | Item 14   | 5         | Group 9   | 1         | 0         | 1             | NULL              | NULL          | NULL              | NULL      

The problem

The problem in my SQL query is the WHERE clause. I will always filter the rows based on their Group column. But there are some subtlety. Whatever the number of Group an Item is member of, I want it to appear in one and only one Group based on these criteria :

  1. If the Item appears in the same Group more than one time, only the line with the lowest priority should be returned. If an Item appears more than one time in the same Group but with the same Priority, then only the first occurrence should be kept Example: for Item 2, only the line with a Priority value of 5 should be returned;
  2. If the Item appears in the Group but is also present in another Group with a lowest Priority, it shouldn't be displayed. Example: Group 1 is selected as a filter. Item 1 should be displayed but Item 13 shouldn't because it is also present in Group 8 with a lower Priority (Item 13 would appear only in Group 8).

Note that this is just a sample. My real dataset has more than 3000 rows and some other cases are probably possible that I haven't listed in my sample.

Unsuccessful Attempts

Like I said, there is one constant in the WHERE clause and that is the Group filtering.

  • Because of the criterion #2, I can't simply start my clause like that : WHERE Group = 'Group 1' and I need to have something a bit more complex.
  • I have tried the following clause without success : WHERE Group = 'Group 1' AND (Group = NextGroup AND Priority < NextPriority). That works well in the case of an Item that is in no more that 2 groups. But for Item 13, it would return the first two rows. And if I add something like AND NOT (CorrectedPriority >= PreviousPriority) to the WHERE clause, I get no results at all.
  • Last attempt so far : (SiteName <> PreviousSiteName AND CorrectedPriority >= PreviousPriority). The problem is that I will never return a line where Rn = 1 because PreviousSiteName will be equal to NULL. Adding a check on NULL doesn't work either. I must have bee tired when trying this particular clause because it's complete garbage.

I will continue to try and find the good WHERE clause but I have the feeling that my whole approach is wrong. I don't see how I could solve the problem when there are more than two entries for the same Item. It is worth noting that this query is used in a SSRS report so I could maybe use custom code to parse the dataset and filter the rows (Working with tables might help solving the issue of Items with more than two entries). But if there's a SQL genius around here with a working solution, that would be great.

PS : if someone knows how to fix this table and can explain it to me, extra cookies for him. :D

Edit :

This is the modified query that I'm using at the moment. I will consider using @Yellowbedwetter's latest query has it seems more robust.

SELECT * 
  FROM (SELECT ItemID,
               Item,
               Priority,
               Group_,
               MIN(Priority) OVER
                 ( PARTITION BY item
                 ) AS interItem_MinPriority
          FROM (SELECT ItemID,
                       Item,
                       Priority,
                       Group_,
                       ROW_NUMBER() OVER
                         ( PARTITION BY Item
                               ORDER BY Priority ASC
                         ) AS interGrp_Rank
                  FROM Test_Table 
               ) AS TMP
         WHERE interGrp_Rank = 1 -- Exclude all records with the same item/group, but higher priority.
       ) AS TMP2
 WHERE Priority = interItem_MinPriority; -- Exclude which aren't the lowest priority across groups.

Solution

  • If I understand the question correctly this should work

    SELECT * 
      FROM (SELECT ItemID,
                   Item,
                   Priority,
                   Group_,
                   MIN(Priority) OVER
                     ( PARTITION BY item
                     ) AS interItem_MinPriority
              FROM (SELECT ItemID,
                           Item,
                           Priority,
                           Group_,
                           ROW_NUMBER() OVER
                             ( PARTITION BY Item,
                                            Group_
                                   ORDER BY Priority ASC
                             ) AS interGrp_Rank
                      FROM Test_Table 
                   ) AS TMP
             WHERE interGrp_Rank = 1 -- Exclude all records with the same item/group, but higher priority.
           ) AS TMP2
     WHERE Priority = interItem_MinPriority; -- Exclude which aren't the lowest priority across groups.
    

    I don't know if your version of SQL Server supports MIN() OVER()..., but if not you should be able to work around that easily enough.

    Edit: To handle tie breaks.

    WITH TEST_TABLE (ItemID, Item, Priority, Group_) AS 
     (
     SELECT '16777397','Item 1','5','Group 1' UNION 
     SELECT '16777403','Item 2','5','Group 2' UNION 
     SELECT '16777403','Item 2','10','Group 2' UNION 
     SELECT '16777429','Item 3','1000','Group 3' UNION 
     SELECT '16777430','Item 4','5','Group 1' UNION 
     SELECT '16777454','Item 5','5','Group 4' UNION 
     SELECT '16777455','Item 6','5','Group 5' UNION 
     SELECT '16777459','Item 6','5','Group 6' UNION 
     SELECT '16777468','Item 8','5','Group 7' UNION 
     SELECT '16777479','Item 9','5','Group 4' UNION 
     SELECT '16777481','Item 10','5','Group 4' UNION 
     SELECT '16777496','Item 11','5','Group 6' UNION 
     SELECT '16777514','Item 12','5','Group 4' UNION 
     SELECT '16777518','Item 13','5','Group 8' UNION 
     SELECT '16777518','Item 13','10','Group 8' UNION 
     SELECT '16777518','Item 13','100','Group 1' UNION 
     SELECT '16777520','Item 14','5','Group 9'
     ) 
    
     SELECT ItemID,
            Item,
            Priority,
            Group_
       FROM (SELECT ItemID,
                    Item,
                    Priority,
                    Group_,
                    ROW_NUMBER() OVER
                      ( PARTITION BY item
                            ORDER BY Group_ ASC -- or however you want to break the tie
                      ) AS grp_minPriority_TieBreak
               FROM (SELECT ItemID,
                            Item,
                            Priority,
                            Group_,
                            MIN(Priority) OVER
                              ( PARTITION BY item
                              ) AS interItem_MinPriority
                       FROM (SELECT ItemID,
                                    Item,
                                    Priority,
                                    Group_,
                                    ROW_NUMBER() OVER
                                      ( PARTITION BY Item,
                                                     Group_
                                            ORDER BY Priority ASC
                                      ) AS interGrp_Rank
                               FROM TEST_TABLE 
                            ) AS TMP
                      WHERE interGrp_Rank = 1 -- Exclude all records with the same item/group, but higher priority.
                    ) AS TMP2
              WHERE Priority = interItem_MinPriority -- Exclude which aren't the lowest priority across groups.
           ) AS TMP2
      WHERE grp_minPriority_TieBreak = 1;