Search code examples
pythonperformancecythonctypes

Execution Speed: Cython vs ctypes


I am learning about different ways to interface python and C. I have this function that takes the sum of integers between 0 and the function input. I've coded this function in python, cython, and C (interfaced using ctypes). Then, I timed the execution time using an input of 5000 and running the function 1000 times, and these were the results:

  • Python: ~0.2 s
  • Ctypes: ~0.01 s (~20x faster)
  • Cython: ~0.0013 s (~154x faster)

Cython is way faster than the ctypes approach (where I coded the C myself). What makes cython so much faster than the ctypes approach? I provide all the details below:

Python setup

example_py.py:

def sumTo(x):
    y = 0
    for i in range(x):
        y += i
    return y

Cython setup

example_cy.pyx:

cpdef int sumTo(int x):
    cdef int y = 0
    cdef int i
    for i in range(x):
        y += i
    return y

Setup file: setup.py:

from distutils.core import setup
from Cython.Build import cythonize

setup(ext_modules = cythonize('example_cy.pyx'))

Compile by running python setup.py build_ext --inplace.

C setup

example_C.h:

#ifndef EXAMPLE_C_
#define EXAMPLE_C_

int sumTo(int x);

#endif

example_C.c:

#include <stdio.h>
#include "example_C.h"

int sumTo(int x) {
    int y = 0;

    for(int i = 0; i < x; i++) {
        y += i;
    }

    return y;
}

Compile by running gcc -shared -o libcalci.so -fPIC example_C.c.

Testing scripts

To test the ctypes approach, I ran this script:

import example_py
from ctypes import *
import time

numRuns = 1000
x = 5000

# Run test on python script
tic = time.perf_counter()
for i in range(numRuns):
    example_py.sumTo(x)
py_runtime = time.perf_counter() - tic

# Run test on c script
libCalc = CDLL("./libcalci.so")
tic = time.perf_counter()
for i in range(numRuns):
    libCalc.sumTo(x)
c_runtime = time.perf_counter() - tic

# Print results
print(py_runtime, c_runtime)
print('Ctypes is {}x faster'.format(py_runtime/c_runtime))

To test the cython approach, I ran this script:

import time
import example_cy
import example_py

numRuns = 1000
x = 5000

# Run test on python script
tic = time.perf_counter()
for i in range(numRuns):
    example_py.sumTo(x)
py_runtime = time.perf_counter() - tic

# Run test on c script
tic = time.perf_counter()
for i in range(numRuns):
    example_cy.sumTo(x)
c_runtime = time.perf_counter() - tic

# Print results
print(py_runtime, c_runtime)
print('Cython is {}x faster'.format(py_runtime/c_runtime))

Any thoughts on why ctypes approach is so much slower than the cython approach? Thank you for your time and wisdom!


Solution

  • Thanks to Jérôme Richard! The ctypes approach was resulting in tests slower than the cython approach since I wasn't using optimization flags when compiling the C code.

    As mentioned above, using the -O3 flag sped the code from the ctypes approach up to 214x faster than python, using -O3 -mavx2 sped it up to 315x faster, and using -O3 -mavx2 -march=native sped it up to 325x faster.