I am trying to implement a HashMap in Cython. This is just an exercise for familiar myself with Cython, hope to do more stuff with it soon.
Here are my codes:
#cython language_level=3
from libc.stdlib cimport malloc, free
from cpython.ref cimport PyObject, Py_INCREF, Py_CLEAR, Py_DECREF, Py_TYPE, PyTypeObject, Py_XDECREF, Py_XDECREF, Py_XINCREF
cdef struct Entry:
long long key
PyObject* value
cdef class HashMap:
cdef Entry** table
cdef long long size
cdef long long capacity
def __cinit__(self, long long capacity=16):
cdef Entry** entry
self.capacity = capacity
self.size = 0
self.table = <Entry**>malloc(capacity * sizeof(Entry))
entry = self.table
for i in range(self.capacity):
entry[i] = <Entry*>malloc(sizeof(Entry))
def __dealloc__(self):
cdef Entry** entry
for i in range(self.capacity):
entry = <Entry**>self.table
if entry != NULL:
free(entry)
free(self.table)
cdef long long hash(self, long long x):
x = (x ^ (x >> 30)) * 0xbf58476d1ce4e5b9
x = (x ^ (x >> 27)) * 0x94d049bb133111eb
x = x ^ (x >> 31)
return x
cpdef put(self, long long key, object value):
cdef Entry** entry
key = self.hash(key)
if len(self) == 0:
self.table[0].key = key
self.table[0].value = <PyObject*>value
self.size += 1
return
entry = <Entry**>self.table
for i in range(self.capacity):
if entry == NULL:
entry[0].key = key
entry[0].value = <PyObject*>value
self.size += 1
break
cpdef get(self, long long key):
cdef Entry** entry
find_key = self.hash(key)
entry = <Entry**>self.table
for i in range(self.capacity):
if entry[0].key == find_key:
return <object>(entry[0].value)
raise KeyError(key)
def __len__(self):
return self.size
After I built and test the codes with the following:
>>> from hashmap.hashmap import HashMap
>>> hm = HashMap()
>>> hm.put(27, "hello")
I encountered the following error:
OverflowError: int too big to convert
Exception ignored in: 'hashmap.HashMap.hash'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
OverflowError: int too big to convert
Initially I used int
and changed to long long
due to OverflowError occurs when using cython with a large int.
The issue is that your constants in the hash
function are not typed, so cython is treating them as python integers (arbitrarily big, not wrapping), then doing the math in this system which gives a number bigger than can be assigned back into x
, giving the OverflowError. The fix is something like
cdef long long hash(self, long long x):
x = (x ^ (x >> 30)) * <long long>(0xbf58476d1ce4e5b9)
x = (x ^ (x >> 27)) * <long long>(0x94d049bb133111eb)
x = x ^ (x >> 31)
return x
However note that you currently also have a memory management bug in your put
function, which will cause a crash when trying to get
...