python numpy optimization sympy callable-object

How to create a truly callable array or matrix in Python

I would like to take make a matrix that has all its entries as a function of some variable x. Hence B(x) would give N x N output ideally in a fast manner. This is a simple task in fact if you are willing to type out the matrix with functions as entries. As an example:

f1 = lambda x: 2*x
f2 = lambda x: x**2
B = lambda x : np.array([[f1(x),f2(x)],
                         [f2(x),f1(x)]])

This is naive as it is not scale-able to a scenario where your array is large and has a variety of functions. Typing it out would take a long time for a single problem. One would usually create an empty array and use two Python for loops to compute the specific function for a given entry and then deposit the output in the array. The array is then returned.

The problem with the above method is that each time the function is called, it runs those for loops. This makes it slow if you want to run the function on a dataset of x values. I've tried to create a static callable array using Sympy's lambdfiy function. For an evaluation of x it seems faster than the for loop solution in pure Python. However, this is horribly outweighed by the setup cost. Please see my code below for the details.

Is there a way to use the vectorize function in Numpy to speed things up? Can you perhaps find a solution that is faster than the for loop version?

I also do play around with the idea (or call it a dream) where one can evaluate the entire dataset of X instead of each x separately. Like broadcasting in Numpy. E.g.

# Naive
result1 = [np.sin(x) for x in X]
# vs 
results2 = np.sin(X)

Anyways, that's quite far-fetched. Here is the code I have written. Please do play around with the size of N to see how fascinating the decline in speed is. Just to clarify, I have evaluated my program as a whole and this issue of the callable array is where the bottleneck lies.

import numpy as np
from sympy import symbols,lambdify,zeros
from time import time


def get_sympy(N):
    '''
    Creates a callable array using Sympys lambdfiy capabilites.
    This is truly a callable array.
    '''
    x = symbols('x')
    output = zeros(N,N)
    for i in range(N):
        for j in range(N):
            if i == j:
                output[i,j] = x**2
            elif i == 1:
                output[i,j] = x**3
            elif j == 0:
                output[i,j] = x**4
            else:
                output[i,j] = x
    return lambdify(x,output,'numpy')

def get_python(x,N):
    '''
    Uses Python loops to setup an array that mimic that of a callable array.
    It is not truly a callable array as the loops run on each call of 
    this function.
    '''
    output = np.zeros((N,N))
    f1 = lambda x: x**2
    f2 = lambda x: x**3
    f3 = lambda x: x**4
    for i in range(N):
        for j in range(N):
            if i == j:
                output[i,j] = f1(x)
            elif i == 1:
                output[i,j] = f2(x)
            elif j == 0:
                output[i,j] = f3(x)
            else:
                output[i,j] = x
    return output


if __name__ == '__main__':
    N = 30
    X = np.random.uniform()
    callable_sympy_array = get_sympy(N)
    callable_python_array = lambda x: get_python(x,N)
    t1 = time()
    sympy_result = callable_sympy_array(X)
    t2 = time()
    python_result = callable_python_array(X)
    t3 = time()
    sympy_func = get_sympy(N)
    t4 = time()
    sympy_time = t2-t1
    python_time = t3-t2
    sympy_setup_time = t4-t3

    print('\nSingle Evaluation Results:\n')
    print('Sympy: ',round(sympy_time,5))
    print('Python: ',round(python_time,5))
    print('Sympy + Setup',round(sympy_setup_time,5))

    evals = 100
    print('\nResults for {0} evaluations of a {1} by {1} array:\n'.format(evals,N))
    print('Sympy: ',sympy_setup_time + evals*sympy_time)
    print('Python: ',python_time*evals)

Solution

Fast numpy evaluation requires applying the built-in compiled operators/functions to whole arrays. Any sort of python level iteration slows you down, as does evaluating (general) Python functions on scalars. The fast stuff is mostly limited to the operators (like **) and ufunc (np.sin, etc).

Your sympy generated function illustrates this:

In an isympy session:

In [65]: M = get_sympy(3)

using the ipython code introspection:

In [66]: M??                                                                                              
Signature: M(x)
Docstring:
Created with lambdify. Signature:

func(x)

Expression:

Matrix([[x**2, x, x], [x**3, x**2, x**3], [x**4, x, x**2]])

Source code:

def _lambdifygenerated(x):
    return (array([[x**2, x, x], [x**3, x**2, x**3], [x**4, x, x**2]]))


Imported modules:
Source:   
def _lambdifygenerated(x):
    return (array([[x**2, x, x], [x**3, x**2, x**3], [x**4, x, x**2]]))
File:      /<lambdifygenerated-8>
Type:      function

So this a function in x, using numpy operations, the ** operator, and array creation. Just as though you had typed it in. sympy creates this by lexical substitutions in its symbolic code, so you can say it does 'type it in'.

It can operate on a scalar

In [67]: M(3)                                                                                             
Out[67]: 
array([[ 9,  3,  3],
       [27,  9, 27],
       [81,  3,  9]])

on an an array, here producing a (3,3,3) result:

In [68]: M(np.arange(1,4))                                                                                
Out[68]: 
array([[[ 1,  4,  9],
        [ 1,  2,  3],
        [ 1,  2,  3]],

       [[ 1,  8, 27],
        [ 1,  4,  9],
        [ 1,  8, 27]],

       [[ 1, 16, 81],
        [ 1,  2,  3],
        [ 1,  4,  9]]])

I expect that it is easy to write a sympy expression that when translated, can't take array argument(s). if tests are notoriously hard to write in an array compatible form, since a Python if expression only works on scalar booleans.

Your get_python won't take an array x, primarily because the

 output = np.zeros((N,N))

has a fixed size; using np.zeros((N,N)+x.shape), x.dtype) might get around that.

In any case, it will be slow because of the python level iteration at each call.

===

It would be faster if you tried to assign groups of elements. For example in this case:

In [76]: output = np.zeros((3,3),int)                                                                     
In [77]: output[:] = 3                                                                                    
In [78]: output[:,0]=3**4                                                                                 
In [79]: output[1,:]=3**3                                                                                 
In [80]: output[np.arange(3),np.arange(3)]=3**2                                                           

In [81]: output                                                                                           
Out[81]: 
array([[ 9,  3,  3],
       [27,  9, 27],
       [81,  3,  9]])

===

frompyfunc is a handy tool for cases like this. In some cases it offers a 2x speed improvement over direct iteration. But even without that it can make the code more concise.

For example, a quick write up of your example:

In [82]: def foo(x,i,j): 
    ...:     if i==j: return x**2 
    ...:     if i==1: return x**3 
    ...:     if j==0: return x**4 
    ...:     return x                                                                                             
In [83]: f = np.frompyfunc(foo, 3, 1)                                                                     

In [84]: f(3,np.arange(3)[:,None], np.arange(3))                                                          
Out[84]: 
array([[9, 3, 3],
       [27, 9, 27],
       [81, 3, 9]], dtype=object)

and for the Out[68] example:

In [98]: f(np.arange(1,4),np.arange(3)[:,None,None], np.arange(3)[:,None]).shape                          
Out[98]: (3, 3, 3)

In [99]: timeit f(np.arange(1,4),np.arange(3)[:,None,None], np.arange(3)[:,None]).shape                   
23 µs ± 471 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [100]: timeit M(np.arange(1,4))                                                                        
21.7 µs ± 440 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Evaluated on a scalar x, my f is about the same speed as your get_python.

In [115]: MM = get_sympy(30)                                                                              
In [116]: timeit MM(3)                                                                                    
109 µs ± 112 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [117]: timeit get_python(3,30)                                                                         
241 µs ± 2.06 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [118]: timeit f(3,np.arange(30)[:,None], np.arange(30)).astype(int)                                    
254 µs ± 1.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)