Search code examples
sqlselectgroup-bywhere-clausehaving

I am confused by SQL select from where in group by having


[![enter image description here][1]][1]now say I have a database, where there is a table kits_parts_needed, and I need to figure out what kit I have parts for, there are fields in this table that relate directly back to the inventory table, fk_inventory_key and fk_inventory_key_for_parts, I have a user scan some barcodes and those barcodes give me keys. I am then able to check this list against the database and hopefully come back with 1 kit. Here is the SQL.

SELECT fk_inventory_key
FROM Kits_Parts_Needed
WHERE 
 fk_inventory_Key_For_Parts IN ('7F983531-7AF8-4F10-A87C-EC555FA51BDB',
'B7FAEB96-21E2-44D5-93B1-ECE4545996FF')
GROUP BY fk_inventory_key
HAVING COUNT(DISTINCT fk_inventory_Key_For_Parts) = 2;

now for some reason this code works as this: it selects the fk_inventory_key from Kits_Parts_Needed where the fk_inventory_key_For_Parts is in the list of keys I have now accrued.

I am then grouping by fk_inventory_Key and using the having function because I want this to be related to both keys I have in my list. And it works. I have tested this on numerous kits, and it always comes back to the right kit.

Now my problem. The way the code is written you would think it would select these kits where the parts = part1 OR part2 and then it would only go and verify that these returned by the where clause have at least 2 relations NOT that those two relations match that which is in the IN clause... however that is exactly how it seems to work.

Can someone please explain to me how this is working as I don't think it really should be... or maybe I have a happy accident and it has just worked on the 5 kits I have tested it on.

I have tried removing the group by having and get:

C4520143-3F5C-4433-AA9B-41550103D71C
C4520143-3F5C-4433-AA9B-41550103D71C
46C42E3A-D7A3-4FA9-83B1-A3CA93395BF3

if I only have the group by and having I get:

0CDE77A4-C626-44F6-8B19-71C440827FC8
1AC67209-62C5-4E70-B218-DFF9608754E5
2005C859-F917-4790-BF4A-6184794020AF
208C3784-FD95-438D-9F5F-F4DECF376A03
24E8B3AA-0489-4C19-8C86-3DEE1173E9BA
28C562E3-3C89-4853-8616-36CD712D3838
2B48A4EA-821C-4F98-9E06-43E53057C9F9
3CE723D8-134F-4556-8DD1-EED4C5AEA4AB
6DF89B22-F25D-4C68-BDF7-773231F80A4D
7309666C-64EB-42BB-B712-8E040ED000EF
77FA3EF7-57B1-472B-A2EC-010AE40C52F7
78E32FB5-6FBF-490C-9314-37AF1AEA27D6
8DCCBD1A-37A9-4D15-86B0-3E9D30A18C02
9472BB5F-6562-4CBA-BBC2-6A4E14458D9B
94D92945-1D96-4D13-A4EC-885FF85804AC
A67C4C3C-D881-4251-80EB-AD04217B5801
C6A916B3-D3F9-4E83-B271-4F01863E9A6B
ECCE496C-F7F1-4D86-8FBE-EA6ED70A69C9

when I have both in there, I get:

C4520143-3F5C-4433-AA9B-41550103D71C

which is the correct answer, but why?

TO ADD FURTHER CONTEXT: now maybe it is because this is all in the same table, my question is why is the having implicitly checking to make sure the example 2 values in the list match those that are in the in clause. To test I created another very simple table, Testing_SQL which has the following fields:

Customer_ID, Order_ID, and Product_ID

and the following example data:

enter image description here

now... if you have the following SQL:

SELECT                Customer_ID
FROM                  Testing_SQL
WHERE                 Product_ID
IN                    (1,5)
GROUP BY              Customer_ID
HAVING COUNT(DISTINCT Product_ID) = 2

OK... now here is where I get messed up, if you look at what the where clause should return, it returns 101 and 102. this is ok! group by, Works great! Having? well if you look both 101, and 102, have purchased multiple things, so if you are counting distinct product_ID both have technically 3, but at least 2. why is this implicitly re-checking the where clause? because it only returns 101! the one that have purchased 1 AND 5.


Solution

  • It seems to me that you missunderstood the logic. Let's look at your last data step by step. First the data:

    WITH    --  S a m p l e   D a t a :
        tbl ( CUSTOMER_ID, ORDER_ID, PRODUCT_ID ) AS
            ( Select 101, 100, 1 From Dual Union All
              Select 101, 101, 7 From Dual Union All
              Select 101, 102, 5 From Dual Union All
              Select 102, 103, 5 From Dual Union All
              Select 102, 104, 6 From Dual Union All
              Select 102, 105, 2 From Dual Union All
              Select 103, 106, 7 From Dual Union All
              Select 103, 107, 3 From Dual Union All
              Select 103, 108, 2 From Dual 
            )
    

    If we select all the rows from the sample data that have PRODUCT_ID IN(1, 5)

    Select  * 
    From    tbl
    Where   PRODUCT_ID IN(1, 5)
    
    /*    R e s u l t :
    CUSTOMER_ID   ORDER_ID PRODUCT_ID
    ----------- ---------- ----------
            101        100          1
            101        102          5
            102        103          5 */
    

    the resultset above means that all other rows for this Select statement doesn't exist and whatever else we add to this code will be dealing just with those three rows, So, if we select just CUSTOMER_ID column...

    Select  CUSTOMER_ID 
    From    tbl
    Where   PRODUCT_ID IN(1, 5)
    
    /*
    CUSTOMER_ID
    -----------
            101
            101
            102 */
    

    ... we got customers from same three rows. Now if we GROUP BY we should get one row per customer ...

    Select CUSTOMER_ID  
    From tbl
    Where PRODUCT_ID IN(1, 5)
    Group By CUSTOMER_ID
    /*
    CUSTOMER_ID
    -----------
            101
            102 */
    

    Before we use HAVING clause we will put Count() aggregation in Select...

    Select CUSTOMER_ID, Count(DISTINCT PRODUCT_ID) "CNT"
    From tbl
    Where PRODUCT_ID IN(1, 5)
    Group By CUSTOMER_ID
    /*
    CUSTOMER_ID        CNT
    ----------- ----------
            101          2
            102          1 */
    

    this is now prety obvious that Customer Id HAVING 2 DISTINCT PRODUCT_IDs is 101. Moving that count into HAVING clause is just another condition to fetch the row or not...

    Select CUSTOMER_ID
    From tbl
    Where PRODUCT_ID IN(1, 5)
    Group By CUSTOMER_ID
    HAVING  Count(DISTINCT PRODUCT_ID) = 2
    /*
    CUSTOMER_ID
    -----------
            101 */
    

    Now if you look at the resultset of the first Select:

    /*    R e s u l t :
    CUSTOMER_ID   ORDER_ID PRODUCT_ID
    ----------- ---------- ----------
            101        100          1
            101        102          5
            102        103          5 */
    

    there is customer 101 with products 1 and 5 and customer 102 with just product 5. So your result is OK - CUSTOMER_ID with two Distinct products under where condition as in the question is 101