Using Scipy's LowLevelCallable and numba cfunc to optimise a time based ode that takes multiple arrays as input

I want to optimise the solving of a system of time based ODE's by combining Numba's cfunc and scipy's LowLevelCallable functionality to greatly speed up odeint(). However, i am having trouble finding out what the exact syntax is for the correct solution.

My (non-optimised) code that works looks something like this:

import numpy as np
import numba as nb
from numba import types
from matplotlib import pyplot as plt
import scipy.integrate as si

# hours in the simulation
T_sim = 72 
# simulate fluctuating outside temp for 3 days
T_out = np.sin(np.linspace(0, T_sim, T_sim)/3.8) * 8 + 18 
# heat capacity of the air and floor
cap_air, cap_flr = 1, 10

def func(T, t):
    T_air, T_flr= T[0], T[1]
    t_ = int(t)
    H_airOut = 4 * U_vent[t_] * (T_air - T_out[t_])
    H_blowAir = U_blow[t_]
    H_airFlr = (T_air - T_flr)
    dT_air = ((H_blowAir 
               - H_airOut
               - H_airFlr) / cap_air)
    dT_flr = H_airFlr / cap_flr
    return dT_air, dT_flr

T0 = [16, 18] # starting temperature for T_air and T_flr
t_eval = np.linspace(0, T_sim-1, T_sim-1)
U_blow = np.full(T_sim, 0.1) # control for the heating
U_vent = np.full(T_sim, 0.3) # control for the ventilation
result = si.odeint(func,T0,t_eval)

# this piece of code is a standin for the real code, where func() is run 
# tens of thousands of times. All variables are constant, except for the 
# two arrays U_blow and U_vent
for _ in range(10):
    U_blow = np.random.rand(T_sim)
    U_vent = np.random.rand(T_sim)
    result = si.odeint(func,T0,t_eval)

plt.plot(result[:, 0], label='T_air')
plt.plot(result[:, 1], label='T_flr')
plt.plot(T_out[:-1], label='T_out')
plt.legend()

Based on other SO posts, im pretty sure that the solution should look something the code below, but i cant get it to work myself. The problem would have been much easier if there were only some variables changing between runs, but this is not the case, as I need to be able to vary the arrays U_blow & U_vent between runs.

import scipy as sp
from numba.types import float64, CPointer, intc

def jit_integrand_function(integrand_function):
    jitted_function = nb.jit(integrand_function, nopython=True)

    @nb.cfunc(float64(intc, CPointer(float64)))
    def wrapped(n, xx):
        return jitted_function(xx[0], xx[1])
    return sp.LowLevelCallable(wrapped.ctypes)

@jit_integrand_function
def func(T, t):
    T_air, T_flr= T[0], T[1]
    t_ = int(t)
    H_airOut = 4 * U_vent[t_] * (T_air - T_out[t_])
    H_blowAir = U_blow[t_]
    H_airFlr = (T_air - T_flr)
    dT_air = ((H_blowAir 
               - H_airOut
               - H_airFlr) / cap_air)
    dT_flr = H_airFlr / cap_flr
    return dT_air, dT_flr

T0 = [16, 18] # starting temperature for T_air and T_flr
t_eval = np.linspace(0, T_sim-1, T_sim-1)

for _ in range(10):
    U_blow = np.random.rand(T_sim)
    U_vent = np.random.rand(T_sim)
    result = si.odeint(func,T0,t_eval, (U_blow, U_vent))

plt.plot(result[:, 0], label='T_air')
plt.plot(result[:, 1], label='T_flr')
plt.plot(T_out[:-1], label='T_out')
plt.legend()

As a sidenote, my full model is not numerically stable enough for the euler forward integration method to work, so that is not option.

Solution

My attempt at an answer is as follows, working of this complete example:

import scipy as sp
import scipy.integrate as si
import numpy as np

# hours in the simulation
T_sim = 72
# simulate fluctuating outside temp for 3 days
T_out = np.sin(np.linspace(0, T_sim, T_sim) / 3.8) * 8 + 18
# heat capacity of the air and floor
cap_air, cap_flr = 1, 10

U_blow = np.random.rand(T_sim)
U_vent = np.random.rand(T_sim)


def example_rhs(T, t, U_blow, U_vent):
    T_air, T_flr = T[0], T[1]
    t_ = int(t)
    H_airOut = 4 * U_vent[t_] * (T_air - T_out[t_])
    H_blowAir = U_blow[t_]
    H_airFlr = (T_air - T_flr)
    dT_air = ((H_blowAir - H_airOut - H_airFlr) / cap_air)
    dT_flr = H_airFlr / cap_flr

    return dT_air, dT_flr


# starting temperature for T_air and T_flr
T0 = [16, 18]
t_eval = np.linspace(0,
                     T_sim - 1,
                     T_sim - 1)

# Repeat the integration a couple of times to increase accuracy
for i in range(100):
    si.odeint(example_rhs, T0, t_eval, args=(U_blow, U_vent))

To analyze the performance of any python code, I would start by using cProfile via python -mcProfile -s cumtime test.py

This gives me the following (abridged):

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
[...]
      100    0.001    0.000    2.021    0.020 _odepack_py.py:28(odeint)
      100    0.843    0.008    2.017    0.020 {built-in method scipy.integrate._odepack.odeint}

[...]
   973000    1.174    0.000    1.174    0.000 test.py:16(example_rhs)

As we see, the code spends ~2 seconds in the odeint function, which is a built-in function, which calls a Fortran code to do the solving. With this time, the python-based rhs is called 973000 times, i.e., 9730 per integration, though this may vary for different inputs. However, only about half ~1.174 seconds are spent within the right hand side. So, even if your callback would be so optimized that its running time would be zero, you would still need ~0.8 seconds for the entire operation. Since this time is spent in a Fortran code, I don't think that there is a way to speed up that part. So, I don't think an order of magnitude is possible, at least with respect to this particular instance (but maybe you original instance behaves differently?).

It is a shame that LowLevelCallables are apparently not supported for ODE integration in scipy, it seems like that would be an obvious addition (though maybe there is something preventing that?!)

Edit: As per Jérôme Richards comments: The suite is indeed old, so maybe another library will help. You could try sundials, which is a pretty state-of-the-art ODE / DAE solver (about the best that I know), which also has a python interface. If your ODE is stiff, you probably want to use a BDF solver, in which case you should provide the Jacobian of your right-hand side as well, provided that your model is sufficiently smooth (see the comments).