Search code examples
sqlsql-servert-sqlsql-server-2016bitcount

How can I efficiently convert a bigint into a base 2 string and count the 1s?


I have a bigint column or variable. I want to convert it into its bitmap (base 2) representation as a varchar and additionally to that count the 1s.

I need both components - not just the bit_count equivalent

Example:

DECLARE @a bigint=-9151314442816847872
,@b bigint=71776119061217280

-9151314442816847872 converts to 1000000100000000000000000000000000000000000000000000000000000000 and there are 2 ones in the bitmap.

71776119061217280 converts to 0000000011111111000000000000000000000000000000000000000000000000 and there are 8 ones in it

The questions are:

  1. How to quickly convert from bigint to bitmap? and
  2. How to quickly counts 1s in the bitmap?

There are millions of numbers to process in a very short time. Every millisecond counts!

Solution needs to be compatible with SQL Server 2016.

Here is my code.

My algorithm converts 1mln numbers in 18 seconds. This is slow if millions of numbers have to be converted. BIT_COUNT() is magnitude of orders faster, but it is available since SQL 2022. I need a solution compatible with 2016.

Can it be done faster?

DECLARE @a bigint=71776119061217280;
DECLARE @s char(64);

SET @s= 
REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
CONVERT(varchar,CAST(@a as binary(8)),2)
,'0','0000'),'1','0001'),'2','0010'),'3','0011'),'4','0100'),'5','0101'),'6','0110'),'7','0111')
,'8','1000'),'9','1001'),'A','1010'),'B','1011'),'C','1100'),'D','1101'),'E','1110'),'F','1111');

PRINT 'Bitmap: '+@s;
PRINT '1s count: '+CAST(64-LEN(REPLACE(@s,'1','')) as varchar);

Solution

  • I'd be minded to compute both elements independently rather than trying to combine them.

    The below runs fairly quickly for me. I did originally just return the Base2String and do the REPLACE and length arithmetic on it but then I was grappling to stop SQL Server re-evaluating the underlying Base2String string twice. And even when I prevented that doing the separate computations still seemed quicker.

    SELECT TOP (1000000) CAST(CRYPT_GEN_RANDOM(8) AS BIGINT) AS val
    INTO #BigInts
    FROM sys.all_objects o1, sys.all_objects o2
    
    SELECT val, Base2String, BitCount
    FROM #BigInts
    CROSS APPLY dbo.BigIntToBase2String(val)
    

    Indicative execution times for the above on my machine were 5.3 seconds. I did also try a couple of other options as below

    --evaluates Base2String twice - 7.5 seconds
    SELECT val, Base2String, 64-LEN(REPLACE(Base2String COLLATE Latin1_General_BIN2,'1','')) AS BitCount
    FROM #BigInts
    CROSS APPLY dbo.BigIntToBase2String(val)
    
    --evaluates Base2String once but adds a nested loops and is even slower 8.4 seconds
    SELECT val, Base2String, 64-LEN(REPLACE(Base2String COLLATE Latin1_General_BIN2,'1','')) AS BitCount
    FROM #BigInts
    OUTER APPLY dbo.BigIntToBase2String(val)
    OPTION (MAXDOP 1)
    

    It has a dependency on the following function that computes both individually and returns them as separate columns. This also has an advantage that if you only require the BitCount in some future scenario you can just select that column to use this without incurring unnecessary string processing.

    For the base2 string conversion my hunch is that potentially a single string concatenation of 64 elements will beat the conversion to hex string and multiple replace. Note that below uses the CONCAT operator to perform a single concatenation rather than + that will perform multiple binary concatenation operations.

    CREATE FUNCTION dbo.BigIntToBase2String 
    (   
        @input bigint
    )
    RETURNS TABLE 
    AS
    RETURN 
    (
    SELECT Base2String = 
    CONCAT(
    IIF(@input & 0x8000000000000000 = 0, '0','1'),
    IIF(@input & 0x4000000000000000 = 0, '0','1'),
    IIF(@input & 0x2000000000000000 = 0, '0','1'),
    IIF(@input & 0x1000000000000000 = 0, '0','1'),
    IIF(@input & 0x0800000000000000 = 0, '0','1'),
    IIF(@input & 0x0400000000000000 = 0, '0','1'),
    IIF(@input & 0x0200000000000000 = 0, '0','1'),
    IIF(@input & 0x0100000000000000 = 0, '0','1'),
    IIF(@input & 0x0080000000000000 = 0, '0','1'),
    IIF(@input & 0x0040000000000000 = 0, '0','1'),
    IIF(@input & 0x0020000000000000 = 0, '0','1'),
    IIF(@input & 0x0010000000000000 = 0, '0','1'),
    IIF(@input & 0x0008000000000000 = 0, '0','1'),
    IIF(@input & 0x0004000000000000 = 0, '0','1'),
    IIF(@input & 0x0002000000000000 = 0, '0','1'),
    IIF(@input & 0x0001000000000000 = 0, '0','1'),
    IIF(@input & 0x0000800000000000 = 0, '0','1'),
    IIF(@input & 0x0000400000000000 = 0, '0','1'),
    IIF(@input & 0x0000200000000000 = 0, '0','1'),
    IIF(@input & 0x0000100000000000 = 0, '0','1'),
    IIF(@input & 0x0000080000000000 = 0, '0','1'),
    IIF(@input & 0x0000040000000000 = 0, '0','1'),
    IIF(@input & 0x0000020000000000 = 0, '0','1'),
    IIF(@input & 0x0000010000000000 = 0, '0','1'),
    IIF(@input & 0x0000008000000000 = 0, '0','1'),
    IIF(@input & 0x0000004000000000 = 0, '0','1'),
    IIF(@input & 0x0000002000000000 = 0, '0','1'),
    IIF(@input & 0x0000001000000000 = 0, '0','1'),
    IIF(@input & 0x0000000800000000 = 0, '0','1'),
    IIF(@input & 0x0000000400000000 = 0, '0','1'),
    IIF(@input & 0x0000000200000000 = 0, '0','1'),
    IIF(@input & 0x0000000100000000 = 0, '0','1'),
    IIF(@input & 0x0000000080000000 = 0, '0','1'),
    IIF(@input & 0x0000000040000000 = 0, '0','1'),
    IIF(@input & 0x0000000020000000 = 0, '0','1'),
    IIF(@input & 0x0000000010000000 = 0, '0','1'),
    IIF(@input & 0x0000000008000000 = 0, '0','1'),
    IIF(@input & 0x0000000004000000 = 0, '0','1'),
    IIF(@input & 0x0000000002000000 = 0, '0','1'),
    IIF(@input & 0x0000000001000000 = 0, '0','1'),
    IIF(@input & 0x0000000000800000 = 0, '0','1'),
    IIF(@input & 0x0000000000400000 = 0, '0','1'),
    IIF(@input & 0x0000000000200000 = 0, '0','1'),
    IIF(@input & 0x0000000000100000 = 0, '0','1'),
    IIF(@input & 0x0000000000080000 = 0, '0','1'),
    IIF(@input & 0x0000000000040000 = 0, '0','1'),
    IIF(@input & 0x0000000000020000 = 0, '0','1'),
    IIF(@input & 0x0000000000010000 = 0, '0','1'),
    IIF(@input & 0x0000000000008000 = 0, '0','1'),
    IIF(@input & 0x0000000000004000 = 0, '0','1'),
    IIF(@input & 0x0000000000002000 = 0, '0','1'),
    IIF(@input & 0x0000000000001000 = 0, '0','1'),
    IIF(@input & 0x0000000000000800 = 0, '0','1'),
    IIF(@input & 0x0000000000000400 = 0, '0','1'),
    IIF(@input & 0x0000000000000200 = 0, '0','1'),
    IIF(@input & 0x0000000000000100 = 0, '0','1'),
    IIF(@input & 0x0000000000000080 = 0, '0','1'),
    IIF(@input & 0x0000000000000040 = 0, '0','1'),
    IIF(@input & 0x0000000000000020 = 0, '0','1'),
    IIF(@input & 0x0000000000000010 = 0, '0','1'),
    IIF(@input & 0x0000000000000008 = 0, '0','1'),
    IIF(@input & 0x0000000000000004 = 0, '0','1'),
    IIF(@input & 0x0000000000000002 = 0, '0','1'),
    IIF(@input & 0x0000000000000001 = 0, '0','1')
    ), 
    BitCount = 
    IIF(@input & 0x8000000000000000 = 0, 0, 1) + 
    IIF(@input & 0x4000000000000000 = 0, 0, 1) + 
    IIF(@input & 0x2000000000000000 = 0, 0, 1) + 
    IIF(@input & 0x1000000000000000 = 0, 0, 1) + 
    IIF(@input & 0x0800000000000000 = 0, 0, 1) + 
    IIF(@input & 0x0400000000000000 = 0, 0, 1) + 
    IIF(@input & 0x0200000000000000 = 0, 0, 1) + 
    IIF(@input & 0x0100000000000000 = 0, 0, 1) + 
    IIF(@input & 0x0080000000000000 = 0, 0, 1) + 
    IIF(@input & 0x0040000000000000 = 0, 0, 1) + 
    IIF(@input & 0x0020000000000000 = 0, 0, 1) + 
    IIF(@input & 0x0010000000000000 = 0, 0, 1) + 
    IIF(@input & 0x0008000000000000 = 0, 0, 1) + 
    IIF(@input & 0x0004000000000000 = 0, 0, 1) + 
    IIF(@input & 0x0002000000000000 = 0, 0, 1) + 
    IIF(@input & 0x0001000000000000 = 0, 0, 1) + 
    IIF(@input & 0x0000800000000000 = 0, 0, 1) + 
    IIF(@input & 0x0000400000000000 = 0, 0, 1) + 
    IIF(@input & 0x0000200000000000 = 0, 0, 1) + 
    IIF(@input & 0x0000100000000000 = 0, 0, 1) + 
    IIF(@input & 0x0000080000000000 = 0, 0, 1) + 
    IIF(@input & 0x0000040000000000 = 0, 0, 1) + 
    IIF(@input & 0x0000020000000000 = 0, 0, 1) + 
    IIF(@input & 0x0000010000000000 = 0, 0, 1) + 
    IIF(@input & 0x0000008000000000 = 0, 0, 1) + 
    IIF(@input & 0x0000004000000000 = 0, 0, 1) + 
    IIF(@input & 0x0000002000000000 = 0, 0, 1) + 
    IIF(@input & 0x0000001000000000 = 0, 0, 1) + 
    IIF(@input & 0x0000000800000000 = 0, 0, 1) + 
    IIF(@input & 0x0000000400000000 = 0, 0, 1) + 
    IIF(@input & 0x0000000200000000 = 0, 0, 1) + 
    IIF(@input & 0x0000000100000000 = 0, 0, 1) + 
    IIF(@input & 0x0000000080000000 = 0, 0, 1) + 
    IIF(@input & 0x0000000040000000 = 0, 0, 1) + 
    IIF(@input & 0x0000000020000000 = 0, 0, 1) + 
    IIF(@input & 0x0000000010000000 = 0, 0, 1) + 
    IIF(@input & 0x0000000008000000 = 0, 0, 1) + 
    IIF(@input & 0x0000000004000000 = 0, 0, 1) + 
    IIF(@input & 0x0000000002000000 = 0, 0, 1) + 
    IIF(@input & 0x0000000001000000 = 0, 0, 1) + 
    IIF(@input & 0x0000000000800000 = 0, 0, 1) + 
    IIF(@input & 0x0000000000400000 = 0, 0, 1) + 
    IIF(@input & 0x0000000000200000 = 0, 0, 1) + 
    IIF(@input & 0x0000000000100000 = 0, 0, 1) + 
    IIF(@input & 0x0000000000080000 = 0, 0, 1) + 
    IIF(@input & 0x0000000000040000 = 0, 0, 1) + 
    IIF(@input & 0x0000000000020000 = 0, 0, 1) + 
    IIF(@input & 0x0000000000010000 = 0, 0, 1) + 
    IIF(@input & 0x0000000000008000 = 0, 0, 1) + 
    IIF(@input & 0x0000000000004000 = 0, 0, 1) + 
    IIF(@input & 0x0000000000002000 = 0, 0, 1) + 
    IIF(@input & 0x0000000000001000 = 0, 0, 1) + 
    IIF(@input & 0x0000000000000800 = 0, 0, 1) + 
    IIF(@input & 0x0000000000000400 = 0, 0, 1) + 
    IIF(@input & 0x0000000000000200 = 0, 0, 1) + 
    IIF(@input & 0x0000000000000100 = 0, 0, 1) + 
    IIF(@input & 0x0000000000000080 = 0, 0, 1) + 
    IIF(@input & 0x0000000000000040 = 0, 0, 1) + 
    IIF(@input & 0x0000000000000020 = 0, 0, 1) + 
    IIF(@input & 0x0000000000000010 = 0, 0, 1) + 
    IIF(@input & 0x0000000000000008 = 0, 0, 1) + 
    IIF(@input & 0x0000000000000004 = 0, 0, 1) + 
    IIF(@input & 0x0000000000000002 = 0, 0, 1) + 
    IIF(@input & 0x0000000000000001 = 0, 0, 1)
    )