Search code examples
hashmapcython

OverflowError: int too big to convert in Cython


I am trying to implement a HashMap in Cython. This is just an exercise for familiar myself with Cython, hope to do more stuff with it soon.

Here are my codes:

#cython language_level=3

from libc.stdlib cimport malloc, free
from cpython.ref cimport PyObject, Py_INCREF, Py_CLEAR, Py_DECREF, Py_TYPE, PyTypeObject, Py_XDECREF, Py_XDECREF, Py_XINCREF

cdef struct Entry:
    long long key
    PyObject* value

cdef class HashMap:
    cdef Entry** table
    cdef long long size
    cdef long long capacity

    def __cinit__(self, long long capacity=16):
        cdef Entry** entry
        self.capacity = capacity
        self.size = 0
        self.table = <Entry**>malloc(capacity * sizeof(Entry))
        entry = self.table
        for i in range(self.capacity):
            entry[i] = <Entry*>malloc(sizeof(Entry))

    def __dealloc__(self):
        cdef Entry** entry
        for i in range(self.capacity):
            entry = <Entry**>self.table
            if entry != NULL:
                free(entry)
        free(self.table)

    cdef long long hash(self, long long x):
        x = (x ^ (x >> 30)) * 0xbf58476d1ce4e5b9
        x = (x ^ (x >> 27)) * 0x94d049bb133111eb
        x = x ^ (x >> 31)
        return x

    cpdef put(self, long long key, object value):
        cdef Entry** entry
        key = self.hash(key)
        if len(self) == 0:
            self.table[0].key = key
            self.table[0].value = <PyObject*>value
            self.size += 1
            return        
        entry = <Entry**>self.table
        for i in range(self.capacity):
            if entry == NULL:
                entry[0].key = key
                entry[0].value = <PyObject*>value
                self.size += 1
                break

    cpdef get(self, long long key):
        cdef Entry** entry
        find_key = self.hash(key)
        entry = <Entry**>self.table
        for i in range(self.capacity):
            if entry[0].key == find_key:
                return <object>(entry[0].value)
        raise KeyError(key)

    def __len__(self):
        return self.size

After I built and test the codes with the following:

>>> from hashmap.hashmap import HashMap
>>> hm = HashMap()     
>>> hm.put(27, "hello")

I encountered the following error:

OverflowError: int too big to convert
Exception ignored in: 'hashmap.HashMap.hash'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OverflowError: int too big to convert

Initially I used int and changed to long long due to OverflowError occurs when using cython with a large int.


Solution

  • The issue is that your constants in the hash function are not typed, so cython is treating them as python integers (arbitrarily big, not wrapping), then doing the math in this system which gives a number bigger than can be assigned back into x, giving the OverflowError. The fix is something like

    cdef long long hash(self, long long x):
       x = (x ^ (x >> 30)) * <long long>(0xbf58476d1ce4e5b9)
       x = (x ^ (x >> 27)) * <long long>(0x94d049bb133111eb)
       x = x ^ (x >> 31)
       return x
    

    However note that you currently also have a memory management bug in your put function, which will cause a crash when trying to get...