Search code examples
mysqlsqlbinaryhexchecksum

SQL string literal hexadecimal key to binary and back


after extensive search I am resorting to stack-overflows wisdom to help me. Problem: I have a database table that should effectively store values of the format (UserKey, data0, data1, ..) where the UserKey is to be handled as primary key but at least as an index. The UserKey itself (externally defined) is a string of 32 characters representing a checksum, which happens to be (a very big) hexadecimal number, i.e. it looks like this UserKey = "000000003abc4f6e000000003abc4f6e".

Now I can certainly store this UserKey in a char(32)-field, but I feel this being mighty inefficient, as I store a series of in principle arbitrary characters, i.e. reserving space for for more information per character than the 4 bits i need to store the hexadecimal characters (0..9,A-F).

So my thought was to convert this string literal into the hex-number it really represents, and store that. But this number (32*4 bits = 16Bytes) is much too big to store/handle as SQL only handles BIGINTS of 8Bytes.

My second thought was to convert this into a BINARY(16) representation, which should be compact and efficient concerning memory. However, I do not know how to efficiently convert between these two formats, as SQL also internally only handles numbers up to the maximum of 8 Bytes.

Maybe there is a way to convert this string to binary block by block and stitch the binary together somehow, in the way of:

 UserKey == concat( stringblock1, stringblock2, ..)
 UserKey_binary = concat( toBinary( stringblock1 ), toBinary( stringblock2 ), ..)

So my question is: is there any such mechanism foreseen in SQL that would solve this for me? How would a custom solution look like? (I find it hard to believe that I should be the first to encounter such a problem, as it has become quite modern to use ridiculously long hashkeys in many applications)

Also, the Userkey_binary should than act as relational key for the table, so I hope for a bit of speed by this more compact representation, as it needs to determine the difference on a minimal number of bits. Additionally, I want to mention that I would like to do any conversion if possible on the Server-side, so that user-scripts have not to be altered (the user-side should, if possible, still transmit a string literal not [partially] converted values in the insert statement)


Solution

  • In Contradiction to my previous statement, it seems that MySQL's UNHEX() function does a conversion from a string block by block and then concat much like I stated above, so the method works also for HEX literal values which are bigger than the BIGINT's 8 byte limitation. Here an example table that illustrates this:

    CREATE TABLE `testdb`.`tab` (
    `hexcol_binary` BINARY(16) GENERATED ALWAYS AS (UNHEX(charcol)) STORED,
    `charcol` CHAR(32) NOT NULL,
    PRIMARY KEY (`hexcol_binary`));
    

    The primary key is a generated column, so that that updates to charcol are the designated way of interacting with the table with string literals from the outside:

    REPLACE into tab (charcol) VALUES ('1010202030304040A0A0B0B0C0C0D0D0');
    SELECT HEX(hexcol_binary) as HEXstring, tab.* FROM tab;
    

    as seen building keys and indexes on the hexcol_binary works as intended.

    To verify the speedup, take

    ALTER TABLE `testdb`.`tab` 
    ADD INDEX `charkey` (`charcol` ASC);
    EXPLAIN SELECT * from tab where  hexcol_binary = UNHEX('1010202030304040A0A0B0B0C0C0D0D0') #keylength 16
    EXPLAIN SELECT * from tab where charcol = '1010202030304040A0A0B0B0C0C0D0D0' #keylength 97
    

    the lookup on the hexcol_binary column is much better performing, especially if its additonally made unique.

    Note: the hex conversion does not care if the hex-characters A through F are capitalized or not for the conversion process, however the charcol will be very sensitive to this.