Search code examples
sqloraclegroup-byleft-joinoracle-sqldeveloper

SQL - join three tables based on (different) latest dates in two of them



Using Oracle SQL Developer, I have three tables with some common data that I need to join.
Appreciate any help on this!
Please refer to https://i.sstatic.net/f37Jh.png for the input and desired output (table formatting doesn't work on all tables). These tables are made up in order to anonymize them, and in reality contain other data with millions of entries, but you could think of them as representing:

  • Product = Main product categories in a grocery store.
  • Subproduct = Subcategory products to the above. Each time the table is updated, the main product category may loses or get some new suproducts assigned to it. E.g. you can see that from May to June the Pulled pork entered while the Fishsoup was thrown out.
  • Issues = Status of the products, for example an apple is bad if it has brown spots on it..

What I need to find is: for each P_NAME, find the latest updated set of subproducts (SP_ID and SP_NAME), and append that information with the latest updated issue status (STATUS_FLAG).
Please note that each main product category gets its set of subproducts updated at individual occasions i.e. 1234 and 5678 might be "latest updated" on different dates.

I have tried multiple queries but failed each time. I am using combos of SELECT, LEFT OUTER JOIN, JOIN, MAX and GROUP BY.

Latest attempt, which gives me the combo of the first two tables, but missing the third:

SELECT
 PRODUCT.P_NAME,
 SUBPRODUCT.SP_PRODUCT_ID, SUBPRODUCT.SP_NAME, SUBPRODUCT.SP_ID, SUPPRODUCT.SP_VALUE_DATE
 FROM SUBPRODUCT
 LEFT OUTER JOIN PRODUCT ON PRODUCT.P_ID = SUBPRODUCT.SP_PRODUCT_ID
   JOIN(SELECT SP_PRODUCT_ID, MAX(SP_VALUE_DATE) AS latestdate FROM SUBPRODUCT GROUP BY SP_PRODUCT_ID) sub ON
sub.SP_PRODUCT_ID = SUBPRODUCT.SP_PRODUCT_ID AND sub.latestDate = SUBPRODUCT.SP_VALUE_DATE;

Solution

  • Trying to find a row with a max value is a common SQL pattern - you can do it with a join, like your example, but it's usually more clear to use a subquery or a window function.

    Correlated subquery example

    select 
      PRODUCT.P_NAME,
      SUBPRODUCT.SP_PRODUCT_ID, SUBPRODUCT.SP_NAME, SUBPRODUCT.SP_ID, SUPPRODUCT.SP_VALUE_DATE,
      ISSUES.STATUS_FLAG, ISSUES.STATUS_LAST_UPDATED
    from PRODUCT
    join SUBPRODUCT
      on PRODUCT.P_ID = SUBPRODUCT.SP_PRODUCT_ID
      and SUBPRODUCT.SP_VALUE_DATE = (select max(S2.SP_VALUE_DATE) as latestDate
                                      from SUBPRODUCT S2
                                      where S2.SP_PRODUCT_ID = SUBPRODUCT.SP_PRODUCT_ID)
    join ISSUES
      on ISSUES.ISSUE_ID = SUBPRODUCT.SP_ID
      and ISSUES.STATUS_LAST_UPDATED = (select max(I2.STATUS_LAST_UPDATED) as latestDate
                                      from ISSUES I2
                                      where I2.ISSUE_ID = ISSUES.ISSUE_ID)
    

    Window function / inline view example

    select 
      PRODUCT.P_NAME,
      S.SP_PRODUCT_ID, S.SP_NAME, S.SP_ID, S.SP_VALUE_DATE,
      I.STATUS_FLAG, I.STATUS_LAST_UPDATED
    from PRODUCT
    join (select SUBPRODUCT.*, 
                 max(SP_VALUE_DATE) over (partition by SP_PRODUCT_ID) as latestDate 
                 from SUBPRODUCT) S
      on PRODUCT.P_ID = S.SP_PRODUCT_ID
      and S.SP_VALUE_DATE = S.latestDate
    join (select ISSUES.*, 
                 max(STATUS_LAST_UPDATED) over (partition by ISSUE_ID) as latestDate 
                 from ISSUES) I
      on I.ISSUE_ID = S.SP_ID
      and I.STATUS_LAST_UPDATED = I.latestDate
    

    This often performs a bit better, but window functions can be tricky to understand.