What I have going on here is:
Table 1: AOC_Model
AOC_ID int (Primary Key)
Model varchar(50)
Table 2: AOC_Chipset
AOC_CHIPSET_ID int (Primary Key)
CONTROLLER_ID int
CHIPSET_ID int
AOC_ID int
Table 3: Controller
CONTROLLER_ID int (Primary Key)
CONTROLLER varchar(10)
Table 4: Chipset
CONTROLLER_ID int (Primary Key)
CHIPSET_ID int (Primary Key)
CHIPSET varchar(50)
Table 5: Notes_Chipset
NOTES_CHIPSET_ID int (Primary Key)
CONTROLLER_ID int
CHIPSET_ID int
DATE date
NOTES varchar(800)
First I have a Chipset table that is joined to Controller via Controller_ID Then I have AOC_Chipset which actually acts as Joining table between Controller and Chipset. AOC_Chipset is joined to Chipset via Controller_ID and Chipset_id Then I have Notes_Chipset which is also joined to Chipset via Controller_ID and Chipset_id And Finally, I have AOC_Model which is joined to AOC_Chipset via AOC_ID
I have many to many relationships going on here. I could have one or two controllers assigned to the same AOC_ID. I could have one or two chipsets assigned to the same controller. I could have multiple Notes assigned to the same Chipset.
I created this query in SQL Server 2019:
SELECT
dbo.AOC_CHIPSET.AOC_ID,
string_agg(dbo.CONTROLLER.CONTROLLER, ', ') AS vControllers,
string_agg(dbo.CHIPSET.CHIPSET, ', ') AS vChipsets,
string_agg(dbo.NOTES_CHIPSET.DATE, ', ') AS vDate,
string_agg(dbo.NOTES_CHIPSET.NOTES, ', ') AS vNotes
FROM
dbo.AOC_CHIPSET
INNER JOIN
dbo.CHIPSET ON dbo.AOC_CHIPSET.CONTROLLER_ID = dbo.CHIPSET.CONTROLLER_ID
AND dbo.AOC_CHIPSET.CHIPSET_ID = dbo.CHIPSET.CHIPSET_ID
INNER JOIN
dbo.CONTROLLER ON dbo.CHIPSET.CONTROLLER_ID = dbo.CONTROLLER.CONTROLLER_ID
INNER JOIN
dbo.NOTES_CHIPSET ON dbo.CHIPSET.CONTROLLER_ID = dbo.NOTES_CHIPSET.CONTROLLER_ID
AND dbo.CHIPSET.CHIPSET_ID = dbo.NOTES_CHIPSET.CHIPSET_ID
AND dbo.CONTROLLER.CONTROLLER_ID = dbo.NOTES_CHIPSET.CONTROLLER_ID
GROUP BY
dbo.AOC_CHIPSET.AOC_ID
The problem is the result contains duplicates
I know I can use the DISTINCT
, but I can't figure out where / how to place it in conjunction with string_agg
?
I replaced the joins by sub-selects placed directly in the select list. This allows me to select distinct values per retrieved property. In fact, there are always 2 sub-selects per string to create. The inner one has a SELECT DISTINCT
and the outer one makes the string_agg
. The inner sub-select filters its rows with a WHERE-clause with AOC_ID
matching the main SELECT
SELECT
ac.AOC_ID,
(SELECT string_agg(CONTROLLER, ', ') FROM
(SELECT DISTINCT CONTROLLER
FROM dbo.CONTROLLER co INNER JOIN dbo.AOC_CHIPSET ac1
ON ac1.CONTROLLER_ID = co.CONTROLLER_ID
WHERE ac1.AOC_ID = ac.AOC_ID) x) AS vControllers,
(SELECT string_agg(CHIPSET, ', ') FROM
(SELECT DISTINCT CHIPSET
FROM dbo.CHIPSET cs INNER JOIN dbo.AOC_CHIPSET ac2
ON ac2.CONTROLLER_ID = cs.CONTROLLER_ID AND ac2.CHIPSET_ID = cs.CHIPSET_ID
WHERE ac2.AOC_ID = ac.AOC_ID) y) AS vChipsets,
(SELECT string_agg([DATE], ', ') FROM
(SELECT DISTINCT [DATE]
FROM dbo.NOTES_CHIPSET nd INNER JOIN dbo.AOC_CHIPSET ac3
ON ac3.CONTROLLER_ID = nd.CONTROLLER_ID AND ac3.CHIPSET_ID = nd.CHIPSET_ID
WHERE ac3.AOC_ID = ac.AOC_ID) z) AS vDate,
(SELECT string_agg(NOTES, ', ') FROM
(SELECT DISTINCT NOTES
FROM dbo.NOTES_CHIPSET nd INNER JOIN dbo.AOC_CHIPSET ac4
ON ac4.CONTROLLER_ID = nd.CONTROLLER_ID AND ac4.CHIPSET_ID = nd.CHIPSET_ID
WHERE ac4.AOC_ID = ac.AOC_ID) z) AS vNotes
FROM
dbo.AOC_CHIPSET ac
GROUP BY
ac.AOC_ID
The SELECT DISTINCT
in the sub-queries work, because the select-list does not contain any controller or chipset id. This would not be possible with joins on the outer level, since those require these ids.