Search code examples
sqlsql-serverstring-agg

STRING_AGG() with duplicated values


I've found a lot of questions in here but none seems to resolve it. I do want to retrieve unique values with STRING_AGG() in SQL Server without using the keyword WITH.

This is my query:

SELECT DISTINCT 
    bld.Code AS building_code, 
    -- CONCAT(bld.Name, ' - via ', dir.Name) AS building_name,
    STRING_AGG(CAST(buildings.evaluation AS NVARCHAR(MAX)), ', ') AS [data] 
FROM
    Dealer de 
INNER JOIN 
    Road ro ON de.ID = ro.Dealer_ID  
INNER JOIN 
    Direction dir ON ro.ID = dir.Road_ID  
INNER JOIN 
    Building bld ON dir.ID = bld.Direction_ID  
INNER JOIN 
    BuildingType bt ON bld.BuildingType_ID = bt.ID  
INNER JOIN 
    StructuralUnit su ON bld.ID = su.Building_ID  
INNER JOIN 
    Span sp ON su.ID = sp.StructuralUnit_ID  
INNER JOIN 
    UnitElement ue ON sp.ID = ue.Span_ID  
INNER JOIN 
    (SELECT DISTINCT TOP 8 
         bld.Code AS building_code, 
         CONCAT(bld.Name, ' - via ', dir.Name) AS building_name,            
         dir.Name AS direction, 
         ROUND(AVG(ins.Evaluation), 2) AS evaluation, 
         YEAR(ig.Date) AS year      
     FROM  
         Building bld   
     INNER JOIN 
         Direction dir ON bld.Direction_ID = dir.ID     
     INNER JOIN 
         Road ro ON dir.Road_ID = ro.ID     
     INNER JOIN 
         Dealer de ON ro.Dealer_ID = de.ID  
     INNER JOIN 
         StructuralUnit su ON bld.ID = su.Building_ID   
     INNER JOIN 
         Span sp ON su.ID = sp.StructuralUnit_ID    
     INNER JOIN 
         UnitElement ue ON sp.ID = ue.Span_ID   
     INNER JOIN 
         Inspection ins ON ue.ID = ins.UnitElement_ID   
     INNER JOIN 
         InspectionGroup ig ON ig.ID = ins.InspectionGroup_ID   
     WHERE 
         ue.Status = 3      
         AND de.ID = 1      
         AND YEAR(ig.Date) IN (2021, 2020, 2019, 2018)      
         AND ig.InspectionTypeModel_ID <> 3     
     GROUP BY 
         bld.Code, bld.Name, dir.Name, YEAR(ig.Date)    
    ORDER BY 
         bld.Code, YEAR(ig.Date)) buildings ON buildings.building_code = bld.Code 
WHERE 
    ue.Status = 3   
    AND de.ID = 1  
GROUP BY 
    bld.Code --, bld.Name, dir.Name 
ORDER BY 
    bld.Code

The subquery (the one with the ROUND(AVG(ins.Evaluation), 2) AS evaluation) produces the correct distinct values:

enter image description here

But the full query doesn't work the way I though (with the DISTINCT and GROUP BY I though values would be unique, just 4 per row)

enter image description here

My desired results would be like this (don't mind the 3 dots at the end, I just resized the screen to show you only 4 values, that's how it's supposed to be)

enter image description here

Any suggestion, without using WITH?


Solution

  • Your main issue is that you are cross joining to your subquery. Your subquery has 4 rows for each building code, but you are only joining on code, therefore for every row in your outer query, you are returning 4 extra rows because of the subquery. If your data varies this could go up to as many as 8 (because of the TOP 8 in the subquery, without this there's more or less no limit to the duplication).

    As far as I can tell you can massively simplify the query, none of the outer query is required at all, and with a small change to the grouping on the subquery you can get the required data without all the additional joins. The TOP also seems unnecessary, with the proper grouping you would only ever get a maximum 4 rows per code anyway (one per year in the IN clause):

    SELECT  b.Code,
            Data = STRING_AGG(b.evaluation, ',') WITHIN GROUP(ORDER BY b.Year)
    FROM    (
                SELECT  bld.Code,
                        Year = YEAR(ig.Date),
                        evaluation = ROUND(AVG(ins.Evaluation), 2)
                FROM    Building AS bld
                        INNER JOIN Direction AS dir
                            ON bld.Direction_ID = dir.ID
                        INNER JOIN Road AS ro
                            ON dir.Road_ID = ro.ID
                        INNER JOIN Dealer AS de
                            ON ro.Dealer_ID = de.ID
                        INNER JOIN StructuralUnit AS su
                            ON bld.ID = su.Building_ID
                        INNER JOIN Span AS sp
                            ON su.ID = sp.StructuralUnit_ID
                        INNER JOIN UnitElement AS ue
                            ON sp.ID = ue.Span_ID
                        INNER JOIN Inspection AS ins
                            ON ue.ID = ins.UnitElement_ID
                        INNER JOIN InspectionGroup AS ig
                            ON ig.ID = ins.InspectionGroup_ID
                WHERE   ue.Status = 3
                AND     de.ID = 1
                AND     YEAR(ig.Date) IN (2021, 2020, 2019, 2018)
                AND     ig.InspectionTypeModel_ID <> 3
                GROUP BY
                        bld.Code, YEAR(ig.Date)
            ) AS b
    GROUP BY b.Code;