Search code examples
sql-serverssissql-server-2012sql-server-2014

Matrix-like comparison of two query results


Is there a way to compare a whole result set of a query against a result set of another query?

To become a little bit more clearer: I have implemented a process to perform several Data comparison, such as "Compare the number of records in Table X against the number of records in Table Y", "Compare the average amount in a table against the expected average amount X", "Compare the value of column X in Table Y for each day of the year against the average yearly amount" and so on.

Now, what I need to implement is the following: Perform a multi column query against a table for a given time span. Then perform the same query (same structure) against the table for a second timespan. Then compare each cell of each row of both query results and output only the rows with differences (ideally with a flag WHICH cell has changed).

And - as if it was not complicated enough - the queries are not static! Means I define numerous queries with a various number of columns for various checks. These queries are then parametrized (timespan etc.) and the results of defined "query pairs" should then be compared...

Any ideas on how to implement it? As a first step I would already be happy to know how to implement this table / matrix comparison...


Solution

  • SQL-Server is not very helpfull when it comes to generically analyse a resultset. There is dynamic SQL and there is - tatatataaaaa! - XML. I really appreciate XML's abilities to deal with unkown sets!

    The following code will tear down a resultset to key-name-value tupels which you can easily compare value by value:

    Two dummy tables. Data is very similar, but differences exist:

    DECLARE @tbl1 TABLE(ID INT,Value1 VARCHAR(10),Value2 VARCHAR,PointInTime DATETIME);
    INSERT INTO @tbl1 VALUES
     (1,'a','b',{d'2017-01-01'})
    ,(2,'a','b',{d'2017-01-02'})
    ,(3,'a','x',{d'2017-01-03'})  
    ,(4,NULL,'b',{d'2017-01-04'})
    
    DECLARE @tbl2 TABLE(ID INT,Value1 VARCHAR(10),Value2 VARCHAR,PointInTime DATETIME);
    INSERT INTO @tbl2 VALUES
     (1,'a','b',{d'2017-01-01'})
    ,(2,NULL,'b',{d'2017-01-02'})
    ,(3,'a','x',{d'2017-01-03'})  
    ,(4,'y','b',{d'2017-01-05'});
    

    --This will create an XML from the firs table

    DECLARE @xml1 XML=
    (
        SELECT * 
        FROM @tbl1 AS tbl 
        FOR XML RAW,ELEMENTS XSINIL,TYPE
    );
    

    --Check the output

    /*
    SELECT @xml1;
    <row xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
      <ID>1</ID>
      <Value1>a</Value1>
      <Value2>b</Value2>
      <PointInTime>2017-01-01T00:00:00</PointInTime>
    </row>
    [...more rows..]
    */
    

    --Same with the second table

    DECLARE @xml2 XML=
    (
        SELECT * 
        FROM @tbl2 AS tbl 
        FOR XML RAW,ELEMENTS XSINIL,TYPE
    );
    

    --We must tell SQL-Server the name of the row's ID for the later join

    DECLARE @rowIDName NVARCHAR(100)=N'ID';
    

    --Two CTEs create a list of ID-Name-Value tupels, which can be compared easily

    WITH AllVals1 AS
    (
        SELECT nd.value(N'(../*[local-name()=sql:variable("@rowIDName")])[1]',N'nvarchar(max)') AS RowID
              ,nd.value(N'local-name(.)',N'nvarchar(max)') AS ElementName
              ,nd.value(N'.',N'nvarchar(max)') AS ElementValue
        FROM @xml1.nodes(N'/row/*') AS A(nd)
    )
    ,AllVals2 AS
    (
        SELECT nd.value(N'(../*[local-name()=sql:variable("@rowIDName")])[1]',N'nvarchar(max)') AS RowID
              ,nd.value(N'local-name(.)',N'nvarchar(max)') AS ElementName
              ,nd.value(N'.',N'nvarchar(max)') AS ElementValue
        FROM @xml2.nodes(N'/row/*') AS A(nd)
    )
    SELECT v1.RowID,v1.ElementName,v1.ElementValue AS V1,v2.ElementValue AS V2
    FROM AllVals1 AS v1
    FULL OUTER JOIN AllVals2  AS v2 ON v1.RowID=v2.RowID AND v1.ElementName=v2.ElementName
    WHERE v1.ElementValue<>v2.ElementValue
    

    The result shows, that in Row 2 "Value1" was "a" and is now empty and in Row 4 "Value1" was empty and is now "y" and the "PointInTime differs:

    RowID   ElementName V1                  V2
    2       Value1      a   
    4       Value1                          y
    4       PointInTime 2017-01-04T00:00:00 2017-01-05T00:00:00