Search code examples
sqlsql-servert-sql

Difference from the first row but aggregating from second row till the end


I have two columns where in the first row (inspection), there are some violations that are spotted and noted... can be 15+ different violations.

Then on every subsequent inspection, there can be some violations that are completed (those that are resolved can NOT show again as unresolved). Number can vary and can even be that inspection didn't see any of remaining violations resolved so for that one it would be an empty row but still has to be counted.

Inspection Date Violation
1 6/1/2024 100
1 6/1/2024 101
1 6/1/2024 102
1 6/1/2024 103
2 6/2/2024 100
2 6/2/2024 101
3 6/7/2024 NULL
4 6/8/2024 103
5 6/9/2024 102

What I need as a result are unresolved violations after each inspection ...

enter image description here

I need this in T-SQL. I know how to use LAG, LEAD, FIRST_VALUE etc. Saying that as I tried to resolve it via window funcs. I have created func to get difference between the values (like violations remaining - violations in inspection) but really what I can't figure is how to move past 2nd inspection

Those are aggregates of the tables beneath and if you have some kind of advice I can go and change queries that bring data in this format. Only thing is that for an inspection I do not have recorded answers for each of violations spotted on first but only one that is cleared/resolved

Thanks in advance

SELECT aggdata.Inspection
     , RemainingViolations =
                            CASE
                                WHEN aggdata.Inspection = 1 THEN aggdata.AggregatedViolations
                                ELSE dbo.udf_GetDifferenceInValues(FIRST_VALUE(aggdata.AggregatedViolations) OVER (ORDER BY aggdata.Inspection), aggdata.AggregatedViolations)
                            END
FROM (
    SELECT SimpleData.Inspection
         , AggregatedViolations = STRING_AGG(SimpleData.Violation, ',')
    FROM (VALUES (1, '6/1/2024', 100),
    (1, '6/1/2024', 101),
    (1, '6/1/2024', 102),
    (1, '6/1/2024', 103),
    (2, '6/1/2024', 100),
    (2, '6/1/2024', 101),
    (3, '6/1/2024', NULL),
    (4, '6/1/2024', 103),
    (5, '6/1/2024', 102)
    ) AS
    SimpleData (Inspection, DateInspection, Violation)
    GROUP BY SimpleData.Inspection
) aggdata
ALTER FUNCTION udf_GetDifferenceInValues
(
      -- Add the parameters for the function here
      @String1 VARCHAR(4000)
    , @String2 VARCHAR(4000)
)
RETURNS VARCHAR(4000)
AS
BEGIN

    -- Declare the return variable here
    DECLARE @Result VARCHAR(4000)

    -- Add the T-SQL statements to compute the return value here

    SELECT @Result = STRING_AGG(Value, ',')
    FROM dbo.dbSplit(@String1, ',')
    WHERE Value NOT IN (
            SELECT Value
            FROM dbo.dbSplit(@String2, ',')
        )


    -- Return the result of the function
    RETURN @Result
END

Solution

  • /* *** Test Data *** */
    CREATE TABLE #t
    (
        Inspection int NOT NULL
        ,[Date] date NOT NULL
        ,Violation int NULL
    );
    INSERT INTO #t
    VALUES (1, '20240601', 100)
    ,(1, '20240601', 101)
    ,(1, '20240601', 102)
    ,(1, '20240601', 103)
    ,(2, '20240602', 100)
    ,(2, '20240602', 101)
    ,(3, '20240607', NULL)
    ,(4, '20240608', 103)
    ,(5, '20240609', 102);
    /* *** End Test Data *** */
    
    WITH vCounts
    AS
    (
        SELECT X.Inspection, T.Violation AS AViolation, T2.Violation
            ,COUNT(T2.Violation) OVER (PARTITION BY T.Violation ORDER BY X.Inspection ROWS UNBOUNDED PRECEDING) AS VCount
        FROM #t T
            JOIN
            (
                SELECT DISTINCT Inspection
                FROM #t
            ) X (Inspection)
                ON T.Inspection = 1
            LEFT JOIN #t T2
                ON X.Inspection = T2.Inspection
                    AND T.Violation = T2.Violation
    )
    ,VGroups
    AS
    (
        SELECT Inspection
            ,STRING_AGG(Violation, ',') WITHIN GROUP (ORDER BY AViolation) AS Violations
            ,STRING_AGG(IIF(Violation IS NULL AND VCount = 1, AViolation, NULL), ',') WITHIN GROUP (ORDER BY AViolation) AS ActiveViolations
        FROM VCounts
        GROUP BY Inspection
    )
    SELECT Inspection, Violations
        ,IIF(Inspection = 1, Violations, ActiveViolations) AS ActiveViolations
    FROM VGroups
    ORDER BY Inspection;