Search code examples
pythonnumba

Weird address issue with linked list implemented by numba jitclass


I'm trying to implement an LRU cache with doubly linked list and hash map. The doubly linked list is quite simple to implement by following the official example using jitclass. However, some bug in the bug made me check the logic in the linked list implementation. In the end, I'm able to reproduce the issue with the following minimal example:

from numba import int32, deferred_type, optional
from numba.experimental import jitclass

node_type = deferred_type()


@jitclass([
    ('key', int32),
    ('value', int32),
    ('next', optional(node_type))
])
class LinkedNode:
    def __init__(self, key, value, next):
        self.key = key
        self.value = value
        self.next = next
        
node_type.define(LinkedNode.class_type.instance_type)

def print_node(node):
    print(hex(id(node)), f'node({node.key}, {node.value})')

def foo():
    head = LinkedNode(-1, -1, None)
    curr = head
    for i in range(10):
        new_node = LinkedNode(i, i, None)
        curr.next = new_node
        curr = new_node
    node = head
    for i in range(10):
        node = node.next
        print_node(node)

foo()

If you save it in a .py file and run it from terminal, you will get some weird result like the following:

$ python minimal_example.py
0x191a8ac7760 node(0, 0)
0x191a8ac76d0 node(1, 1)
0x191a8ac7760 node(2, 2)
0x191a8ac76d0 node(3, 3)
0x191a8ac7760 node(4, 4)
0x191a8ac76d0 node(5, 5)
0x191a8ac7760 node(6, 6)
0x191a8ac76d0 node(7, 7)
0x191a8ac7760 node(8, 8)
0x191a8ac76d0 node(9, 9)

Basically a head node is defined, then 10 more nodes are linked sequentially. But checking the address with id, shows that there are only 2 unique objects. And the content of nodes looks correct.

My question is: Why does this happen? Is it because of some internal implementation of numba's jitclass, or CPython?

Additional notes:

  • If you comment out the jit part, the code runs as expected: each node has different addresses.
  • More confusingly, if you copy the same code in a Jupyter notebook, the result is correct.

I notice a similar issue here: Object comparison in numba jitclass. I can mitigate the same issue with my newly added key attribute. I'm only trying to understand why this happens.


Solution

  • It is not a bug but an artefact of the way Numba works internally. Indeed, id gives you the address of a Python object but Numba does not operate on Python object internally but its own C types. Python objects are just wrapper types so to be used from a Python code. Your loop create new temporary Python objects for each iteration. Only two Python objects are needed: node and node.next. The previous objects are deleted due to reference counting. The allocator of CPython recycle the addresses because reusing addresses it generally more efficient (due to cache locality). The new objects are different and refer to different Numba object address internally but the address are the same. A similar effect often happen in C/C++ when objects are created on the stack. It does not happen with a pure-Python code because Python operate on persistent objects and not wrappers. Note that the creation of temporary objects and JIT function calls for each iteration is not very efficient and thus I expect the code not to be much faster with Numba. By the way, the effect would disappear if foo was a njit function.

    Put it shortly, id(node) does not works the way you think. It operates on wrapper Numba objects so is not very useful here.