The context: my Python code pass arrays of 2D vertices to OpenGL.
I tested 2 approaches, one with ctypes, the other with struct, the latter being more than twice faster.
from random import random
points = [(random(), random()) for _ in xrange(1000)]
from ctypes import c_float
def array_ctypes(points):
n = len(points)
return n, (c_float*(2*n))(*[u for point in points for u in point])
from struct import pack
def array_struct(points):
n = len(points)
return n, pack("f"*2*n, *[u for point in points for u in point])
Any other alternative? Any hint on how to accelerate such code (and yes, this is one bottleneck of my code)?
You could try Cython. For me, this gives:
function usec per loop:
Python Cython
array_ctypes 1370 1220
array_struct 384 249
array_numpy 336 339
So Numpy only gives 15% benefit on my hardware (old laptop running WindowsXP), whereas Cython gives about 35% (without any extra dependency in your distributed code).
If you can loosen your requirement that each point is a tuple of floats, and simply make 'points' a flattened list of floats:
def array_struct_flat(points):
n = len(points)
return pack(
"f"*n,
*[
coord
for coord in points
]
)
points = [random() for _ in xrange(1000 * 2)]
then the resulting output is the same, but the timing goes down further:
function usec per loop:
Python Cython
array_struct_flat 157
Cython might be capable of substantially better than this too, if someone smarter than me wanted to add static type declarations to the code. (Running 'cython -a test.pyx' is invaluable for this, it produces an html file showing you where the slowest (yellow) plain Python is in your code, versus python that has been converted to pure C (white). That's why I spread the code above out onto so many lines, because the coloring is done per-line, so it helps to spread it out like that.)
Full Cython instructions are here: http://docs.cython.org/src/quickstart/build.html
Cython might produce similar performance benefits across your whole codebase, and in ideal conditions, with proper static typing applied, can improve speed by factors of ten or a hundred.