Search code examples
sqloracle-databasejoinsql-insemi-join

Equality of "select ... where in" and joins


Suppose I have a table1 like this:

id | itemcode
-------------
1  | c1
2  | c2
...

And a table2 like this:

item | name
-----------
c1   | acme
c2   | foo
...

Would the following two queries return the same result set under every condition?

SELECT id, itemcode 
FROM table1 
WHERE itemcode IN (SELECT DISTINCT item 
                  FROM table2 
                  WHERE name [some arbitrary test])

SELECT id, itemcode 
FROM table1 
   JOIN (SELECT DISTINCT item 
          FROM table2
          WHERE name [some arbitrary test]) items 
         ON table1.itemcode = items.item

Unless I'm really missing something stupid, I'd say yes. But I've done two queries which boil down to this form and I am getting different results. There are some nested queries using WHERE IN, but for the last step I've noticed a JOIN is much faster. The nested queries are all entirely isolated so I don't believe they are the problem, so I just want to eliminate the possibility that I've got a misconception regarding the above.

Thanks for any insights.

EDIT

The two original queries:

SELECT imitm, imlitm, imglpt 
    FROM jdedata.F4101 
    WHERE imitm IN 
  (SELECT DISTINCT ivitm AS itemno 
       FROM jdedata.F4104 
       WHERE ivcitm IN 
    (SELECT DISTINCT ivcitm AS legacycode 
             FROM jdedata.F4104 
              WHERE ivitm IN 
      (SELECT DISTINCT tritm 
               FROM trigdata.F4101_TRIG)
    )
  )


SELECT orig.imitm, orig.imlitm, orig.imglpt 
    FROM jdedata.F4101 orig 
        JOIN 
        (SELECT DISTINCT ivitm AS itemno 
        FROM jdedata.F4104 
        WHERE ivcitm IN 
              (SELECT DISTINCT ivcitm AS legacycode 
              FROM jdedata.F4104
               WHERE ivitm IN 
                 (SELECT DISTINCT tritm 
                  FROM trigdata.F4101_TRIG))) itemns 
ON orig.imitm = itemns.itemno

EDIT 2

Although I still don't understand why the queries returned different results, it would seem our logic was flawed from the beginning since we were using the wrong columns in some parts. Mind that I'm not saying I made a mistake interpreting the queries as written above or had some typo, we just needed to select on some different stuff.

Normally I don't rest until I get to the bottom of things like these, but I'm very tired and am entering my first vacation since January that spans more than one day, so I can't really be bothered searching further right now. I'm sure the tips given here will come in handy later. Upvotes have been distributed for all the help and I've accepted Ypercube's answer, mostly because his comments have led me the furthest. But thanks all round! If I do find out more later, I'll try to remember pinging back in.


Solution

  • Since table2.item is not nullable, the 2 versions are equivalent. You can remove the distinct from the IN version, it's not needed. You can check these 3 versions and their execution plans:

    SELECT id, itemcode FROM table1 WHERE itemcode IN
      ( SELECT item FROM table2 WHERE name [some arbitrary test] )
    
    SELECT id, itemcode FROM table1 JOIN
      ( SELECT DISTINCT item FROM table2 WHERE name [some arbitrary test] )
      items ON table1.itemcode = items.item
    
    SELECT id, itemcode FROM table1 WHERE EXISTS
      ( SELECT * FROM table2 WHERE table1.itemcode = table2.item 
                               AND (name [some arbitrary test]) )