Search code examples
sql-serversql-server-2008varbinary

Convert SQL Server varbinary(max) into a set of primary keys of type int


Disclaimer: not my code, not my database design!

I have a column of censusblocks(varbinary(max), null) in a MS SQL Server 2008 db table (call it foo for simplicity).

This column is actually a null or 1 to n long list of int. The ints are actually foreign keys to another table (call it censusblock with a pk id of type of int), numbering from 1 to ~9600000.

I want to query to extract the censusblocks list from foo, and use the extracted list of int from each row to look up the corresponding censusblock row. There's a long, boring rest of the query that will be used from there, but it needs to start with the census blocks pulled from the foo table's censusblocks column.

This conversion-and-look-up is currently handled on the middle tier, with a small .NET utility class to convert from List<int> to byte[] (and vice versa), which is then written into/read from the db as varbinary. I would like to do the same thing, purely in SQL.

The desired query would go something along the lines of

SELECT f.id, c.id
FROM foo f 
LEFT OUTER JOIN censusblock c ON 
c.id IN f.censusblocks --this is where the magic happens
where f.id in (1,2)

Which would result in:

f.id   |   c.id

 1         8437314
 1         8438819
 1         8439744
 1         8441795
 1         8442741
 1         8444984
 1         8445568
 1         8445641
 1         8447953
 2         5860657
 2         5866881
 2         5866881
 2         5866858
 2         5862557
 2         5870475
 2         5868983
 2         5865207
 2         5863465
 2         5867301
 2         5864057
 2         5862256

NB: the 7-digit results are coincidental. The range is, as stated above, 1-7 digits.

The actual censusblocks column looks like

SELECT TOP 2 censusblocks FROM foo

which results in

censublocks

0x80BE4280C42380C7C080CFC380D37580DC3880DE8080DEC980E7D1
0x596D3159858159856A59749D59938B598DB7597EF7597829598725597A79597370

For further clarification, here's the guts of the .NET utility classes conversion methods:

    public static List<int> getIntegersFromBytes(byte[] data)
    {
        List<int> values = new List<int>();
        if (data != null && data.Length > 2)
        {
            long ids = data.Length / 3;
            byte[] oneId = new byte[4];
            oneId[0] = 0;
            for (long i = 0; i < ids; i++)
            {
                oneId[0] = 0;
                Array.Copy(data, i * 3, oneId, 1, 3);
                if (BitConverter.IsLittleEndian)
                { Array.Reverse(oneId); }
                values.Add(BitConverter.ToInt32(oneId, 0));
            }}
        return values;
    }

    public static byte[] getBytesFromIntegers(List<int> values)
    {
        byte[] data = null;
        if (values != null && values.Count > 0)
        {
            data = new byte[values.Count * 3];
            int count = 0;
            byte[] idBytes = null;
            foreach (int id in values)
            {
                idBytes = BitConverter.GetBytes(id);
                if (BitConverter.IsLittleEndian)
                { Array.Reverse(idBytes); }
                Array.Copy(idBytes, 1, data, count * 3, 3);
                count++;
            } }
        return data;
    }

Solution

  • An example of how this might be done. It is unlikely to scale brilliantly. If you have a numbers table in your database it should be used in place of nums_cte.

    This works by converting the binary value to a literal hex string, then reading it in 8-character chunks

    -- create test data
    DECLARE @foo TABLE
    (id int ,
     censusblocks varbinary(max)
    )
    
    DECLARE @censusblock TABLE
    (id int)
    
    INSERT @censusblock (id)
    VALUES(1),(2),(1003),(5030),(5031),(2),(6)
    
    
    INSERT @foo (id,censusblocks)
    VALUES (1,0x0000000100000002000003EB),
    (2,0x000013A6000013A7)
    
    
    --query
    DECLARE @biMaxLen bigint
    SELECT @biMaxLen = MAX(LEN(CONVERT(varchar(max),censusblocks,2))) FROM @foo
    
    ;with nums_cte
    AS
    (
        SELECT TOP (@biMaxLen) ((ROW_NUMBER() OVER (ORDER BY a.type) - 1) * 8) AS n
        FROM master..spt_values as a
        CROSS JOIN master..spt_values as b
    )
    ,binCTE
    AS
    (
        SELECT d.id, CAST(CONVERT(binary(4),SUBSTRING(s,n + 1,8),2) AS int) as cblock
        FROM (SELECT Id, CONVERT(varchar(max),censusblocks,2) AS s FROM @foo) AS d
        JOIN nums_cte
        ON n < LEN(d.s)
    )
    SELECT *
    FROM    binCTE as b
    LEFT
    JOIN    @censusblock c
    ON      c.id = b.cblock
    ORDER BY b.id, b.cblock
    

    You could also consider adding your existing .Net conversion methods into the database as an assembly and accessing them through CLR functions.