Search code examples
sql-serveralgorithmt-sqlset-cover

Query for Set Cover


Good Day, I would like to implement a T-SQL query for the Set Cover Problem but have been unable to find any hints on how to do this in SQL.

In my case, my table just has two columns (IDnumber and Mut) and I want to find the minimum number of IDNumber to get one of every Mut. I really want to obtain three IDnumbers for every Mut but I figured I better start off with just one because that might be easier.

DECLARE @myTable TABLE (
    IDnumber int,
    Mut varchar(1))

INSERT @myTable VALUES 
 (1,'C'), (1,'N'), (1,'Z'), (1,'M'), (1,'E'), (2,'E'), (3,'B'), (3,'N'), (3,'D'), (3,'K'), 
(3,'W'), (4,'O'), (4,'G'), (4,'N'), (4,'B'), (4,'U'), (4,'C'), (5,'Q'), (5,'H'), (6,'K'), 
(6,'Y'), (6,'M'), (6,'A'), (6,'O'), (6,'U'), (6,'J'), (7,'H'), (7,'U'), (7,'M'), (7,'L'), 
(8,'B'), (8,'K'), (8,'P'), (9,'Y'), (9,'K'), (10,'Z'), (11,'R'), (12,'X'), (12,'R'), 
(12,'O'), (12,'Z'), (4,'C'), (1,'Z'), (4,'S'), (6,'E'), (5,'G'), (4,'C'), (4,'S'), (4,'H'), 
(6,'D'), (7,'W'), (3,'U'), (6,'N'), (7,'Y'), (6,'N'), (6,'F'), (4,'C'), (4,'I'), (7,'P'), 
(10,'H'), (10,'Z'), (10,'S'), (7,'Z'), (6,'B'), (7,'Z'), (8,'X'), (8,'J'), (8,'P'), (10,'K'), 
(8,'K'), (12,'P'), (8,'W'), (10,'M'), (12,'F'), (9,'T'), (9,'D'), (14,'Y'), (12,'P'), 
(14,'J'), (13,'D'), (15,'H'), (12,'J'), (6,'H'), (2,'Z'), (8,'G'), (10,'Q'), (6,'D'), 
(5,'X'), (9,'T'), (6,'W'), (6,'K'), (10,'W'), (7,'J'), (11,'W'), (12,'V'), (9,'F'), (7,'F'), 
(4,'M'), (5,'K'), (12,'G'), (12,'T'), (15,'T'), (13,'W'), (7,'J'), (9,'T'), (10,'U'), (9,'S'), 
(10,'L'), (10,'J'), (10,'H'), (11,'H'), (12,'S'), (12,'A'), (14,'L'), (13,'K'), (13,'D'), 
(4,'M'), (3,'N'), (4,'F'), (7,'M'), (7,'V'), (5,'R'), (4,'K'), (5,'F'), (7,'G'), (8,'M'), 
(4,'X'), (7,'F'), (9,'S'), (7,'N'), (6,'W'), (6,'W'), (5,'S'), (9,'Z'), (10,'I'), (11,'Y'), 
(11,'D'), (9,'X'), (7,'G'), (9,'S'), (9,'H'), (9,'T'), (8,'J'), (10,'U'), (9,'F'), (9,'S'), 
(7,'D'), (14,'R'), (10,'F'), (7,'E'), (15,'M'), (12,'F'), (5,'C'), (8,'E'), (16,'G'), (11,'V'),
(10,'I'), (12,'I'), (11,'Y'), (12,'I'), (14,'J'), (15,'D'), (19,'J'), (16,'B'), (12,'G'), 
(9,'J'), (18,'J'), (18,'C'), (16,'Q'), (18,'P'), (13,'F'), (19,'T'), (15,'J'), (15,'R'), 
(15,'Q'), (15,'O'), (11,'A'), (24,'B'), (19,'S'), (22,'I'), (15,'X'), (20,'T'), (15,'E'), 
(9,'V'), (8,'H'), (16,'N'), (17,'H')
--  Since the above list was generated by a bunch of random numbers/letters I need to
-- delete the duplicates

;WITH cte AS (
  SELECT IDnumber, mut, 
     row_number() OVER(PARTITION BY IDNumber, mut ORDER BY IDNumber) AS [rn]
  FROM @myTable
)
DELETE cte WHERE [rn] > 1


SELECT *
FROM ( SELECT IDnumber, Mut FROM @myTable) AS S
PIVOT
(COUNT(Mut) FOR mut IN ([A],[B],[C],[D],[E],[F],[G],[H],[I],[J],[K],[L],[M],[N],[O],[P],
[Q],[R],[S],[T],[U],[V],[W],[X],[Y],[Z])) AS pvt

So you can see from the pivot table that the minimum IDnumbers would be 3,5,7, and 12.

How would one go about implementing the algorithm? It seems to me that I could find all the combinations (2^6) which would be sets and then determine which sets have all the Muts. The set with the smallest number of IDnumbers is then the minimal set.

That kind of brute force may work but it would be horribly inefficient. My real world case is not enormous, I have 43 unique Muts (not the nine in the example) and ~2000 IDnumbers, but I am thinking that would take some time to run because 2^2000 is pretty darn big...

Thanks!


Solution

  • Here's an approach which calculates a bitmask of Mut values for each IDNumber, then uses a recursive CTE to generate all the possible combinations which complete the set. The code outputs all the possible IdNumber combinations which give the final result, excluding those where one or more IdNumber is redundant in a combination (such as 1,2,3,4,5,6 in the sample data set).

    To convert this to work with your real data, the major difference is going to be that you'll almost certainly need to find another mechanism to assign bit values to Mut values. (I can cheat a bit here and calculate the bit values by manipulating the decimal ASCII code for each Mut letter - POWER(2,ASCII(Mut) - 64)).

    DECLARE @myTable TABLE (
        IDnumber int,
        Mut varchar(1))
    
    INSERT @myTable VALUES 
        (1,'A'),(1,'B'),(1,'C'),(1,'D'),
        (2,'A'),(2,'C'),(2,'D'),
        (3,'A'),(3,'B'),(3,'F'),(3,'Z'),
        (4,'A'),(4,'B'),(4,'E'),(4,'F'),
        (5,'Y'),
        (6,'X'),(6,'Z')
    
    -- calculate the target bitmask
    DECLARE @target bigint = (  SELECT SUM(POWER(2,ASCII(Mut) - 64))
                                FROM    (SELECT DISTINCT Mut FROM @myTable) AS x
                             )
    
    ;WITH baseCTE
    AS
    (
        --calculate a starting bitmask for each ID
        SELECT IDnumber, sum(mutbit) AS bitmask 
        FROM    (SELECT DISTINCT IDnumber, CAST(POWER(2,ASCII(Mut) - 64) AS bigint) AS mutbit FROM @myTable) AS x
        GROUP BY IDnumber
    )
    ,recCTE
    AS
    (
        SELECT IDnumber, bitmask, CAST(IdNumber AS varchar(max)) AS trail, 1 AS lev
        FROM baseCTE
        UNION ALL
        SELECT b.IDnumber, b.bitmask | r.bitmask, CAST(CONCAT(r.trail,',',b.IDnumber) AS varchar(max)), r.lev + 1
        FROM recCTE AS r
        JOIN baseCTE AS b
        ON b.IDnumber > r.IDnumber
        WHERE b.bitmask | r.bitmask <> r.bitmask --ignore combinations which don't add any new Mut values
    )
    SELECT trail, lev 
    FROM recCTE
    WHERE bitmask = @target
    ORDER BY lev
    

    NB. SQL Server bitwise operators only work where one or other value is of an integer type, so this solution is restricted to 64 distinct Mut values (where the mask is a bigint) - to get it to work beyond that, you'd have to use a custom bitwise comparator (possibly using the CLR). However, since the question mentions that you have 43 Mut values, it might work for you for now.