Search code examples
sql-server-2012group-byduplicatesinner-join

How to JOIN and GROUP correctly


I am having trouble understanding how to join three tables together without creating duplicate records.

I have my first query which uses table profile, and tabe instruments:

SELECT
  [p].[shopper_id]
, [pi].[instrument_id]
FROM
  [dbo].[profile] [p]
INNER JOIN [dbo].[profile_instruments] [pi]
ON  [pi].[PID] = [p].[PID]
WHERE
  [p].[date_created] > DATEADD(yy, -2, GETDATE())
  AND [p].[shopper_id] = '53D5444535434747A935E207C9EDD96A'
ORDER BY
  [p].[shopper_id];

This query gives me the results:

shopper_id  instrument_id
53D5444535434747A935E207C9EDD96A    35
53D5444535434747A935E207C9EDD96A    17

My second query uses table profile and table styles:

SELECT
  [p].[shopper_id]
, [ps].[style_id]
FROM
  [dbo].[profile] [p]
INNER JOIN [dbo].[profile_styles] [ps]
ON  [ps].[PID] = [p].[PID]
WHERE
  [p].[date_created] > DATEADD(yy, -2, GETDATE())
  AND [p].[shopper_id] = '53D5444535434747A935E207C9EDD96A'
ORDER BY
  [p].[shopper_id];

The results are:

shopper_id  style_id
53D5444535434747A935E207C9EDD96A    845
53D5444535434747A935E207C9EDD96A    291

When I combine the 3 tables profile, instruments and styles:

SELECT
    [p].[shopper_id]
  , [pi].[instrument_id]
  , [ps].[style_id]
FROM
    [dbo].[profile] [p]
INNER JOIN [dbo].[profile_instruments] [pi]
ON  [pi].[PID] = [p].[PID]
INNER JOIN [dbo].[profile_styles] [ps]
ON  [ps].[PID] = [p].[PID]
WHERE
    [p].[date_created] > DATEADD(yy, -2, GETDATE())
    AND [p].[shopper_id] = '53D5444535434747A935E207C9EDD96A'
ORDER BY
    [p].[shopper_id];

I get the results:

shopper_id  instrument_id   style_id
53D5444535434747A935E207C9EDD96A    35  845
53D5444535434747A935E207C9EDD96A    35  291
53D5444535434747A935E207C9EDD96A    17  845
53D5444535434747A935E207C9EDD96A    17  291

I am not using a group by yet because I am unsure how to apply it with the columns I have. I am also unsure if the duplicates are because of the type of join I am using, or because I am not using a group by.

At any rate, I would like to ask for some help to be able to figure out what modifications to my query I need to make in order to have an output that looks like:

shopper_id  instrument_id   style_id
53D5444535434747A935E207C9EDD96A    35  845
53D5444535434747A935E207C9EDD96A    17  291

Thank you in advance for your help.


Solution

  • Maybe...

    We assign a row number to each style and instrument for each PID. Then not only do we join by PID, we join by Row_number which guarantees when a pID has 2 instruments and two styles, we still only get 2 records; instead of 4.

    A full outer join is used as I don't know if you want to see situations where 2 instruments exists but only 1 style or 2 styles exist but only 1 instrument.

    SELECT [p].[shopper_id]
         , [pi].[instrument_id]
         , [ps].[style_id]
    FROM [dbo].[profile] [p]
    INNER JOIN (SELECT A.*, row_number() over (partition by PID order by instrument_ID) RN 
                FROM [dbo].[profile_instruments] A) [pi]
      ON  [pi].[PID] = [p].[PID]
    FULL OUTER JOIN (SELECT A.*, Row_number() over (partition by PID order by style_ID) RN 
                     FROM [dbo].[profile_styles] A)  [ps]
      ON  [ps].[PID] = [p].[PID]
     AND [PI].[RN] = [PS].[RN]
    WHERE [p].[date_created] > DATEADD(yy, -2, GETDATE())
      AND [p].[shopper_id] = '53D5444535434747A935E207C9EDD96A'
    ORDER BY [p].[shopper_id];
    

    We may be able to use a coalesce on the PI.RN or PS.RN or both if you are willing to accept that when styles and instruments have different counts that matching to a single value in either table is acceptable

    Example:

    PID STYLE_ID      PID Instrument_ID
    1   A             1   Z
    1   B
    
    The above should return
    1 A Z
    1 B
    
    But maybe you want
    1 A Z
    1 B Z
    

    This may work if we change AND [PI].[RN] = [PS].[RN] to AND coalesce([PI].[RN],1) = coalesce([PS].[RN],1) but testing needs to occur. As if one side is devoid of all records for a pid, you'd still get null on one table.