Search code examples
cythonbitwise-operatorsbit-shiftinteger-arithmetic

C-Style Arithmetic in Cython on unsigned int


Is there an easy way to make left shift, and subtraction work as they would in C on unsigned integers with Cython?

For instance:

def left_shift(unsigned int x, unsigned int shift):
  return x << shift

def main():
  print left_shift(0xffffffff, 4)
  print left_shift(0xffffffff, 8)
  print left_shift(0xffffffff, 12)

I would expect this to print the decimal equivalent of

0xfffffff0
0xffffff00
0xfffff000

and that is actually what I get.

4294967280
4294967040
4294963200

However, if I try to do something more elaborate such as use one of Jenkins' hash functions on a large input, this is what I get:

def hash_fcn1(unsigned int key):
  key = (key ^ 0xdeadbeef) + (key << 4)
  key = key ^ (key >> 10)
  key = key + (key << 7)
  key = key ^ (key >> 13)
  return key

hash_fcn1(0xffffffff)

File "./hash_fcn_test.py", line 94, in <module>
    main()
  File "./hash_fcn_test.py", line 60, in main
    print hash_fcn1(0xffffffff)
  File "hash_fcns.pyx", line 6, in hash_fcns.hash_fcn1 (/home/medusa/.pyxbld/temp.linux-x86_64-2.7/pyrex/hash_fcns.c:854)
    key = (key ^ 0xdeadbeef) + (key << 4)
**OverflowError: value too large to convert to unsigned int**

Similar issues arise when the value of a computation would result in a negative number. Is there any way around these issues? I would like the computation to behave just as it does in C. Is that too much to ask? I've scoured the web, and it appears that the common practice is just to bitwise and (&) each result with MAX_INT, but that is very heavy handed.

Is there just a flag that I can set either in the Cython compiler or elsewhere?


Solution

  • I believe, the arithmetic type if cython depends on the types of numbers being operated on. I believe the issue in your code is on this line, key = (key ^ 0xdeadbeef) + (key << 4). Cython translates this line to:

      __pyx_t_1 = __Pyx_PyInt_From_unsigned_int(__pyx_v_key); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 13; __pyx_clineno = __LINE__; goto __pyx_L1_error;}
      __Pyx_GOTREF(__pyx_t_1);
      __pyx_t_2 = PyNumber_Xor(__pyx_t_1, __pyx_int_3735928559); if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 13; __pyx_clineno = __LINE__; goto __pyx_L1_error;}
      __Pyx_GOTREF(__pyx_t_2);
      __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
      __pyx_t_1 = __Pyx_PyInt_From_long((__pyx_v_key << 4)); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 13; __pyx_clineno = __LINE__; goto __pyx_L1_error;}
      __Pyx_GOTREF(__pyx_t_1);
      __pyx_t_3 = PyNumber_Add(__pyx_t_2, __pyx_t_1); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 13; __pyx_clineno = __LINE__; goto __pyx_L1_error;}
      __Pyx_GOTREF(__pyx_t_3);
      __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
      __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
      __pyx_t_4 = __Pyx_PyInt_As_unsigned_int(__pyx_t_3); if (unlikely((__pyx_t_4 == (unsigned int)-1) && PyErr_Occurred())) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 13; __pyx_clineno = __LINE__; goto __pyx_L1_error;}
      __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;
      __pyx_v_key = __pyx_t_4;
    

    What you probably want is this key = (key ^ <unsigned int> 0xdeadbeef) + (key << 4), which gets translated to:

    __pyx_v_key = ((__pyx_v_key ^ ((unsigned int)0xdeadbeef)) + (__pyx_v_key << 4));
    

    Big difference right :). You might find the need for an explicit cast here surprising, but I think it makes sense. In cython everything behaves the way it would in pytyhon, unless explicitly told to do something different. Here cython treats 0xdeadbeef as a python int type, unless you explicitly cast it or assign it to a typed variable.

    If you're not already using it, I would highly recommend using cython -a and reviewing the html file that's created. It highlights your code in different shades of yellow depending on how directly each line can be converted to c. It makes catching subtle things like this much easier.