Computing hamming distances on large data set

I have an input file of about 10^5 rows.
Each row is a sequence of 24 bits, i.e.:

1 0 1 1 1 0 1 0 1 0 1 1 1 0 1 0 1 0 1 1 1 0 1 0

I need to compute the Hamming Distance for every pair of rows. Here's my first implementation using SciPy hamming function:

from scipy.spatial.distance import hamming

with open('input.txt', 'r') as file:
    reader = csv.reader(file, delimiter=' ')
    nodes = {}
    b = 24  # Number of bits
    for nodeNum, node in enumerate(reader):
        node[nodeNum] = [int(i) for i in node]

for u, uBits in nodes.items():
    for v, vBits in nodes.items():
        distance = hamming(uBits, vBits) * b
            # Do stuff

Second implementation I came up with:

node[nodeNum] = sum([int(bit)*2**power for power, bit in enumerate(node)])

Here I only store the decimal value but I then have to manually count set bits resulting from each XOR operation:

def hamming(a, b):
    N = a ^ b

    distance = 0
    ptr = 1

    while N:
        distance += ((N + 1) //
                           2 * ptr)
        N -= (N + 1) // 2
        ptr += 1

    return distance

How can I improve my code (both in terms of memory usage and runtime, ideally)?

Solution

This may be the fastest you can do (watchout, for your data size it'd require to allocate 74.5 GiB of memory):

import numpy as np

nodes = []
with open('input.txt', 'r') as file:
    reader = csv.reader(file, delimiter=' ')
    for node in reader:
        nodes.append([int(i) for i in node])

dists = 2 * np.inner(nodes-0.5, 0.5-nodes) + nodes.shape[1] / 2

Just for fun, here is a 40X faster version in Julia:

using LoopVectorization, Tullio

function hamming!(nodes,dists)
    @tullio dists[i,j] = sum(nodes[i,k] ⊻ nodes[j,k])
end

n = 10^5
nodes = rand(Int8[0,1],n,24)
dists = Matrix{Int8}(undef,n,n)
@time hamming!(nodes,dists) # Run twice
  # 1.886367 seconds (114 allocations: 6.594 KiB)

While we're at it, I invite you to enter the world of Julia. It offers similar speeds to C++ and a pleasant syntax similar to Python.