Search code examples
sql-serverstring-aggregation

Aggregate without duplicates


What I have going on here is:

Table 1: AOC_Model

AOC_ID int (Primary Key)
Model varchar(50)

Table 2: AOC_Chipset

AOC_CHIPSET_ID int (Primary Key)
CONTROLLER_ID int
CHIPSET_ID int
AOC_ID int

Table 3: Controller

CONTROLLER_ID int (Primary Key)
CONTROLLER varchar(10)

Table 4: Chipset

CONTROLLER_ID int (Primary Key)
CHIPSET_ID int (Primary Key)
CHIPSET varchar(50)

Table 5: Notes_Chipset

NOTES_CHIPSET_ID int (Primary Key)
CONTROLLER_ID int
CHIPSET_ID int
DATE date
NOTES varchar(800)

First I have a Chipset table that is joined to Controller via Controller_ID Then I have AOC_Chipset which actually acts as Joining table between Controller and Chipset. AOC_Chipset is joined to Chipset via Controller_ID and Chipset_id Then I have Notes_Chipset which is also joined to Chipset via Controller_ID and Chipset_id And Finally, I have AOC_Model which is joined to AOC_Chipset via AOC_ID

I have many to many relationships going on here. I could have one or two controllers assigned to the same AOC_ID. I could have one or two chipsets assigned to the same controller. I could have multiple Notes assigned to the same Chipset.

I created this query in SQL Server 2019:

SELECT
    dbo.AOC_CHIPSET.AOC_ID, 
    string_agg(dbo.CONTROLLER.CONTROLLER, ', ') AS vControllers, 
    string_agg(dbo.CHIPSET.CHIPSET, ', ') AS vChipsets, 
    string_agg(dbo.NOTES_CHIPSET.DATE, ', ') AS vDate,
    string_agg(dbo.NOTES_CHIPSET.NOTES, ', ') AS vNotes
FROM
    dbo.AOC_CHIPSET 
INNER JOIN 
    dbo.CHIPSET ON dbo.AOC_CHIPSET.CONTROLLER_ID = dbo.CHIPSET.CONTROLLER_ID 
                AND dbo.AOC_CHIPSET.CHIPSET_ID = dbo.CHIPSET.CHIPSET_ID 
INNER JOIN 
    dbo.CONTROLLER ON dbo.CHIPSET.CONTROLLER_ID = dbo.CONTROLLER.CONTROLLER_ID 
INNER JOIN 
    dbo.NOTES_CHIPSET ON dbo.CHIPSET.CONTROLLER_ID = dbo.NOTES_CHIPSET.CONTROLLER_ID  
                      AND dbo.CHIPSET.CHIPSET_ID = dbo.NOTES_CHIPSET.CHIPSET_ID 
                      AND dbo.CONTROLLER.CONTROLLER_ID = dbo.NOTES_CHIPSET.CONTROLLER_ID
GROUP BY 
    dbo.AOC_CHIPSET.AOC_ID

The problem is the result contains duplicates

duplicates

I know I can use the DISTINCT, but I can't figure out where / how to place it in conjunction with string_agg?


Solution

  • I replaced the joins by sub-selects placed directly in the select list. This allows me to select distinct values per retrieved property. In fact, there are always 2 sub-selects per string to create. The inner one has a SELECT DISTINCT and the outer one makes the string_agg. The inner sub-select filters its rows with a WHERE-clause with AOC_ID matching the main SELECT

    SELECT
        ac.AOC_ID,
        (SELECT string_agg(CONTROLLER, ', ') FROM
          (SELECT DISTINCT CONTROLLER
           FROM dbo.CONTROLLER co INNER JOIN dbo.AOC_CHIPSET ac1
              ON ac1.CONTROLLER_ID = co.CONTROLLER_ID
           WHERE ac1.AOC_ID = ac.AOC_ID) x) AS vControllers,
        (SELECT string_agg(CHIPSET, ', ') FROM
          (SELECT DISTINCT CHIPSET
           FROM dbo.CHIPSET cs INNER JOIN dbo.AOC_CHIPSET ac2 
              ON ac2.CONTROLLER_ID = cs.CONTROLLER_ID AND ac2.CHIPSET_ID = cs.CHIPSET_ID
           WHERE ac2.AOC_ID = ac.AOC_ID) y) AS vChipsets,
        (SELECT string_agg([DATE], ', ') FROM
          (SELECT DISTINCT [DATE]
           FROM dbo.NOTES_CHIPSET nd INNER JOIN dbo.AOC_CHIPSET ac3
              ON ac3.CONTROLLER_ID = nd.CONTROLLER_ID AND ac3.CHIPSET_ID = nd.CHIPSET_ID
           WHERE ac3.AOC_ID = ac.AOC_ID) z) AS vDate,
        (SELECT string_agg(NOTES, ', ') FROM
          (SELECT DISTINCT NOTES
           FROM dbo.NOTES_CHIPSET nd INNER JOIN dbo.AOC_CHIPSET ac4
              ON ac4.CONTROLLER_ID = nd.CONTROLLER_ID AND ac4.CHIPSET_ID = nd.CHIPSET_ID
           WHERE ac4.AOC_ID = ac.AOC_ID) z) AS vNotes
    FROM
        dbo.AOC_CHIPSET ac
    GROUP BY
        ac.AOC_ID
    

    The SELECT DISTINCT in the sub-queries work, because the select-list does not contain any controller or chipset id. This would not be possible with joins on the outer level, since those require these ids.