Search code examples
pythoncswitch-statementcythoncompiler-optimization

Seeking Cython Optimization for Conditional Branching: Is There an Equivalent to switch?


I am currently working on a Python project that I need to rewrite in Cython to enhance performance. In this Python code, there is a segment that uses a series of if/elif statements to determine the execution path for a task, which involves hundreds of branches. Additionally, this segment of code will be executed repeatedly. I am looking for a way to rewrite this in Cython that might offer performance improvements similar to the switch statement found in other languages.

For context, languages like Java implement the switch statement in such a way that its time complexity can be O(1) for tableswitch or O(log n) for lookupswitch, depending on the sparsity of cases. This is achieved through compiler optimizations that are not present in the linear time complexity (O(n)) of chained if/else if statements. Similarly, C/C++ compilers can optimize switch statements to enhance performance (by jump table O(1) and binary search O(log n)).

Given that Cython compiles code to C, I am wondering if there is a syntax or construct in Cython that behaves like the switch statement, potentially benefiting from compiler optimizations to avoid the linear time complexity. Does Cython offer any such feature, or are there recommended practices for achieving similar performance improvements when dealing with a large number of conditional branches?

Please note that in Python 3.10, a syntax resembling the switch statement, known as match/case, was introduced. However, it lacks compiler optimizations, maintaining a time complexity of O(n).

I have also considered using a Python dictionary and encapsulating each if/elif branch into a separate function to determine the branching function, as shown below:

func_map = {
    0: func0,
    1: func1,
    3: func3,
    10: func10,
}

However, I found that the overhead of using a Python dictionary is too significant. Additionally, the functions func0, func1, func3 etc., have different parameter lists (both in terms of the number of parameters and their names), which makes the approach quite inelegant.


Solution

  • Cython generally aims to be Python syntax+C typing, so isn't really aiming to add new flow control features. So there isn't a specific feature for what you're after.

    However, Cython does already recognise simple chained if/elif and convert them to switch statements in C:

    def f(int i):
        if i==0:
            print("aaa")
        elif i==5 or i==7:
            print("bbb")
        elif i==9:
            print("ccc")
        else:
            print("ddd")
    

    is changed to:

    switch (__pyx_v_i) {
        case 0:
         __pyx_t_1 = __Pyx_PyObject_Call(__pyx_builtin_print, __pyx_tuple_, NULL); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 3, __pyx_L1_error)
        __Pyx_GOTREF(__pyx_t_1);
        __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
        break;
        case 5:
        case 7:
        __pyx_t_1 = __Pyx_PyObject_Call(__pyx_builtin_print, __pyx_tuple__2, NULL); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 5, __pyx_L1_error)
        __Pyx_GOTREF(__pyx_t_1);
        __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
        break;
        case 9:
        __pyx_t_1 = __Pyx_PyObject_Call(__pyx_builtin_print, __pyx_tuple__3, NULL); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 7, __pyx_L1_error)
        __Pyx_GOTREF(__pyx_t_1);
        __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
        break;
        default:
        __pyx_t_1 = __Pyx_PyObject_Call(__pyx_builtin_print, __pyx_tuple__4, NULL); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 9, __pyx_L1_error)
        __Pyx_GOTREF(__pyx_t_1);
        __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
        break;
      }
    

    Note that the more complicated case i==5 or i==7 is also translated correctly.

    To give the maximum chance of this working, the argument (i here) needs to be an integer/enum type, and the values you're comparing with should be integer literals or enum values. You should also not include extra logic in the if statements (elif i==9 and something_else probably won't work for example).

    This optimization can be controlled by the optimize.use_switch compiler directive, however it's on by default.

    Obviously the problem is that this is a pretty opaque optimization and you don't get any warning if Cython doesn't make it for some reason. But equally, I don't think the O(1) switch statements are guaranteed by C or C++ so you're also relying on opaque compiler stuff there.


    A few notes on match/case:

    • it wasn't designed as a switch replacement - it was more for more unpacking more complicated nested dict/list structures. The switch-like part of it is a building block for the full behaviour rather than the intended main use.
    • it isn't implemented in the main Cython branch yet,
    • if you want to try it it's available in the patma-preview branch.
      • that's a few thousand lines of unreviewed code so treat with caution,
      • no effort is made to keep that branch up-to-date, or provide wheels or anything like that so it's there on a DIY basis,
      • simple match/case blocks should be translated to switch in the same way that chained if-statements are. So if you prefer the syntax you can get the optimization described above.