I'm trying to implement an LRU cache with doubly linked list and hash map. The doubly linked list is quite simple to implement by following the official example using jitclass. However, some bug in the bug made me check the logic in the linked list implementation. In the end, I'm able to reproduce the issue with the following minimal example:
from numba import int32, deferred_type, optional
from numba.experimental import jitclass
node_type = deferred_type()
@jitclass([
('key', int32),
('value', int32),
('next', optional(node_type))
])
class LinkedNode:
def __init__(self, key, value, next):
self.key = key
self.value = value
self.next = next
node_type.define(LinkedNode.class_type.instance_type)
def print_node(node):
print(hex(id(node)), f'node({node.key}, {node.value})')
def foo():
head = LinkedNode(-1, -1, None)
curr = head
for i in range(10):
new_node = LinkedNode(i, i, None)
curr.next = new_node
curr = new_node
node = head
for i in range(10):
node = node.next
print_node(node)
foo()
If you save it in a .py file and run it from terminal, you will get some weird result like the following:
$ python minimal_example.py
0x191a8ac7760 node(0, 0)
0x191a8ac76d0 node(1, 1)
0x191a8ac7760 node(2, 2)
0x191a8ac76d0 node(3, 3)
0x191a8ac7760 node(4, 4)
0x191a8ac76d0 node(5, 5)
0x191a8ac7760 node(6, 6)
0x191a8ac76d0 node(7, 7)
0x191a8ac7760 node(8, 8)
0x191a8ac76d0 node(9, 9)
Basically a head node is defined, then 10 more nodes are linked sequentially. But checking the address with id
, shows that there are only 2 unique objects. And the content of nodes looks correct.
My question is: Why does this happen? Is it because of some internal implementation of numba's jitclass, or CPython?
Additional notes:
I notice a similar issue here: Object comparison in numba jitclass. I can mitigate the same issue with my newly added key
attribute. I'm only trying to understand why this happens.
It is not a bug but an artefact of the way Numba works internally. Indeed, id
gives you the address of a Python object but Numba does not operate on Python object internally but its own C types. Python objects are just wrapper types so to be used from a Python code. Your loop create new temporary Python objects for each iteration. Only two Python objects are needed: node
and node.next
. The previous objects are deleted due to reference counting. The allocator of CPython recycle the addresses because reusing addresses it generally more efficient (due to cache locality). The new objects are different and refer to different Numba object address internally but the address are the same. A similar effect often happen in C/C++ when objects are created on the stack. It does not happen with a pure-Python code because Python operate on persistent objects and not wrappers. Note that the creation of temporary objects and JIT function calls for each iteration is not very efficient and thus I expect the code not to be much faster with Numba. By the way, the effect would disappear if foo
was a njit
function.
Put it shortly, id(node)
does not works the way you think. It operates on wrapper Numba objects so is not very useful here.