Search code examples
pythonbitwise-operatorsxor

Unexpected behavior when computing the XOR between two signed 64-bit integers in python


I need to compute the hamming distance between two integers by counting the number of differing bits between their binary representations.

This is the function that I am using for that purpose:

def hamming(a, b):
    # compute and return the Hamming distance between the integers
    return bin(int(a) ^ int(b)).count("1")

I started to conduct some simple tests on this function to make sure it works properly but almost immediately I see that it does not and I am trying to understand as to why.

I tested the function with these two numbers:

a = -1704441252336819740
b = -1704441252336819741

The binary representations of these numbers given by python are:

bin(a): -0b10111 10100111 01100100 01001001 11011010 00001110 11011110 00011100 
bin(b): -0b10111 10100111 01100100 01001001 11011010 00001110 11011110 00011101

As you can see their binary representations are the same aside for the first digit thus the hamming distance should be 1. However, the returned hamming distance from the function is 3 and I can't seem to understand why.

The issue arises when I compute the XOR between these two digits as a ^ b returns 7 (thus counts 3 '1' bits) when I would expect it to return 1 (and count 1 '1' bit).

I believe this has to do with the fact that the XOR value seems to be getting stored as an unsigned integer with the minimal number of possible bits whereas I need it to be stored as a

How am I misunderstanding the XOR operator and how can I change my function to work the way I want it to?


Solution

  • Actually, it is the bin function that is misleading:
    Instead of displaying the actual binary value stored, it displays |x| (absolute value) and prints minus sign in front of it for negative numbers.

    But, that is not how the values are actually stored.

    XOR operates on the actual binary values which are stored in two's compliment, and that is why you are getting bigger bit difference then you expected.

    As a simple example lets take two 4 bit numbers:

    -10 = 0b0110
    -11 = 0b0101
      ^ = 0b0011
    

    As you can see, in this representation there are two bits of difference between these two numbers, while if they were positive, there would be only one bit difference.