Search code examples
sqlsql-servert-sqlazure-sql-databaserecursive-query

Self join or Recursive query to get the count


I have a requirement to group by rows where one value of a grouping set exists in another row. Like I have the below table

 | OrderID | ProductID |
 | 123     |    A      |
 | 123     |    B      |
 | 223     |    B      |
 | 223     |    C      |
 | 323     |    C      |
 | 323     |    D      |
 | 423     |    E      |

And I need the below. I have been thinking few ways but cant get anywhere. Any help please?

 |ProductId | NoOfOrders |
 | A|B|C|D  |     3      |
 |    E     |     1      |

Solution

  • Here you go. This starts with a simple grouping based on matching products, then uses a loop to recursively group things together until it can find no more merges to do:

    CREATE TABLE #data 
    (
        OrderId INT, 
        ProductId CHAR(1)
    );
    
    INSERT INTO #data (OrderId, ProductId)
    VALUES (123,'A'),
    (123,'B'),
    (223,'B'),
    (223,'C'),
    (323,'C'),
    (323,'D'),
    (423,'E');
    
    CREATE TABLE #data2
    (
        OrderId INT, 
        ProductId CHAR(1),
        Grp INT
    );
    
    WITH cte AS
    (
        SELECT OrderId,
               ProductId,
            DENSE_RANK() OVER (ORDER BY ProductId) r
        FROM #data
    )
    INSERT INTO #data2(OrderId, ProductId, Grp)
    SELECT cte.OrderId,
        cte.ProductId,
        r
    FROM cte
    
    DECLARE @updates INT = 1;
    
    WHILE @updates > 0
    BEGIN
        -- join groups where there is a lower numbered group that it connects to
        UPDATE a
        SET Grp = b.Grp
        FROM #data2 a
        INNER JOIN #data2 b ON b.OrderId = a.OrderId
            OR b.ProductId = a.ProductId
        WHERE a.Grp > b.Grp
    
        -- end when we have done nothing this cycle
        SET @updates = @@ROWCOUNT
    END
    
    SELECT 
        STUFF((SELECT DISTINCT '|'+ ProductId                                  
                FROM   #data2 
                WHERE grp = o.grp 
                FOR xml path('')
                ), 1, 1, '') AS Products,
                COUNT(DISTINCT o.OrderId) AS NumOrders
    FROM #data2 o
    GROUP BY grp