Search code examples
azure-cosmosdbuser-defined-functionshamming-distance

How to calculate Hemming Distance in CosmosDB?


Each item in my collection has a 64-bit number, which represents dhash of the image. I want to run a query by this field, which will return all items, that have Hamming Distance more or less than some param.

In MySQL I would use BIT_COUNT function. Is there any built-in analog of it in CosmosDB? If no, then how my HAMMING_DISTANCE UDF should look like since JS doesn't support bitwise operations on 64-bit numbers?


Solution

  • To solve this I took code from long.js and ImageHash for using in CosmosDB UDF. All cudos to their authors.

    See gist it here https://gist.github.com/okolobaxa/55cc08a0d67bc60505bfe3ab4f8bc33c

    Usage:

    SELECT udf.HAMMING_DISTANCE(files.ContentId, '1279796919517872320') FROM files
    

    But please note a few things:

    1. CosmosDB doesn't support 64-bit numbers as numbers, you have to store them as strings.
    2. Using this UDF costs a lot of RUs

    I created a feature request on the CosmosDB Feedback forum to add built-in support of such functions. Please vote for these ideas if you're interested in it too: