I am learning about different ways to interface python and C. I have this function that takes the sum of integers between 0 and the function input. I've coded this function in python, cython, and C (interfaced using ctypes). Then, I timed the execution time using an input of 5000 and running the function 1000 times, and these were the results:
Cython is way faster than the ctypes approach (where I coded the C myself). What makes cython so much faster than the ctypes approach? I provide all the details below:
example_py.py
:
def sumTo(x):
y = 0
for i in range(x):
y += i
return y
example_cy.pyx
:
cpdef int sumTo(int x):
cdef int y = 0
cdef int i
for i in range(x):
y += i
return y
Setup file: setup.py
:
from distutils.core import setup
from Cython.Build import cythonize
setup(ext_modules = cythonize('example_cy.pyx'))
Compile by running python setup.py build_ext --inplace
.
example_C.h
:
#ifndef EXAMPLE_C_
#define EXAMPLE_C_
int sumTo(int x);
#endif
example_C.c
:
#include <stdio.h>
#include "example_C.h"
int sumTo(int x) {
int y = 0;
for(int i = 0; i < x; i++) {
y += i;
}
return y;
}
Compile by running gcc -shared -o libcalci.so -fPIC example_C.c
.
To test the ctypes approach, I ran this script:
import example_py
from ctypes import *
import time
numRuns = 1000
x = 5000
# Run test on python script
tic = time.perf_counter()
for i in range(numRuns):
example_py.sumTo(x)
py_runtime = time.perf_counter() - tic
# Run test on c script
libCalc = CDLL("./libcalci.so")
tic = time.perf_counter()
for i in range(numRuns):
libCalc.sumTo(x)
c_runtime = time.perf_counter() - tic
# Print results
print(py_runtime, c_runtime)
print('Ctypes is {}x faster'.format(py_runtime/c_runtime))
To test the cython approach, I ran this script:
import time
import example_cy
import example_py
numRuns = 1000
x = 5000
# Run test on python script
tic = time.perf_counter()
for i in range(numRuns):
example_py.sumTo(x)
py_runtime = time.perf_counter() - tic
# Run test on c script
tic = time.perf_counter()
for i in range(numRuns):
example_cy.sumTo(x)
c_runtime = time.perf_counter() - tic
# Print results
print(py_runtime, c_runtime)
print('Cython is {}x faster'.format(py_runtime/c_runtime))
Any thoughts on why ctypes approach is so much slower than the cython approach? Thank you for your time and wisdom!
Thanks to Jérôme Richard! The ctypes approach was resulting in tests slower than the cython approach since I wasn't using optimization flags when compiling the C code.
As mentioned above, using the -O3
flag sped the code from the ctypes approach up to 214x faster than python, using -O3 -mavx2
sped it up to 315x faster, and using -O3 -mavx2 -march=native
sped it up to 325x faster.