python numpy vectorization nested-for-loop

Is there a way to speed up or vectorize a nested for loop?

I am fairly new to coding in general and to Python in particular. I am trying to apply a weighted average scheme into a big dataset, which at the moment is taking hours to complete and I would love to speed up the process also because this has to be repeated several times.

The weighted average represents a method used in marine biogeochemistry that includes the history of gas transfer velocities (k) in-between sampling dates, where k is weighted according to the fraction of water column (f) ventilated by the atmosphere as a function of the history of k and assigning more importance to values that are closer to sampling time (so the weight at sampling time step = 1 and then it decreases moving away in time):

Weight average equation extracted from (https://doi.org/10.1029/2017GB005874) pp. 1168

In my attempt I used a nested for loop where at each time step t I calculated the weighted average:

def kw_omega (k, depth, window, samples_day):
    """
        calculate  the scheme weights for gas transfer velocity of oxygen
        over the previous window of time, where the most recent gas transfer velocity
        has a weight of 1, and the weighting decreases going back in time. The rate of decrease
        depends on the wind history and MLD.

        Parameters
        ----------
        k: ndarray
            instantaneous O2 gas transfer velocity

        depth: ndarray
            Water depth

        window: integer
            weighting period in days which equals the residence time of oxygen at sampling day

        samples_day: integer
            number of samples in each day composing window

        Returns
        ---------
        weighted_kw: ndarray

        Notes
        ---------
        n = the weighting period / the time resolution of the wind data
        samples_day = the time resolution of the wind data
        omega = is the weighting coefficient at each time step within the weighting window
        f = the fraction of the water column (mixed layer, photic zone or full water column) ventilated at each time

    """
    Dt = 1./samples_day
    f = (k*Dt)/depth
    f = np.flip(f)
    k = np.flip(k)
    n = window*samples_day
    weighted_kw = np.zeros(len(k))
    for t in np.arange(len(k) - n):
        omega = np.zeros((n))
        omega[0] = 1.
        for i in np.arange(1,len(omega)):
            omega[i] = omega[i-1]*(1-f[t+(i-1)])
        weighted_kw[t] = sum(k[t:t+n]*omega)/sum(omega)
        print(f"t = {t}")
    return np.flip(weighted_kw)

This should be used on model simulation data which was set to run for almost 2 years where the model time step was set to 60 seconds, and sampling is done at intervals of 7 days. Therefore k has shape (927360) and n, representing the number of minutes in 7 days has shape (10080). At the moment it is taking several hours to run. Is there a way to make this calculation faster?

Solution

I would recommend to use the package numba to speed up your calculation.

import numpy as np
from numba import njit
from numpy.lib.stride_tricks import sliding_window_view

@njit
def k_omega(k_win, f_win):
   delta_t = len(k_win)
   omega_sum = omega = 1.0
   k_omega_sum = k_win[0]
   for t in range(1, delta_t):
       omega *= (1 - f_win[t])
       omega_sum += omega
       k_omega_sum = k_win[t] * omega
   return k_omega_sum / omega_sum

@njit
def windows_k_omega(k_wins, f_wins):
   size = len(k_wins)
   result = np.empty(size)
   for i in range(size):
       result[i] = k_omega(k_wins[i], f_wins[i])
   return result

def kw_omega(k, depth, window, samples_day):
   n = window * samples_day  # delta_t
   f = k / depth / samples_day
   k_wins = sliding_window_view(k, n)
   f_wins = sliding_window_view(f, n)
   k_omegas = windows_k_omega(k_wins, f_wins)
   weighted_kw = np.pad(weighted_kw, (len(k)-len(k_omegas), 0))
   return weighted_kw

Here, I have split up the function into three in order to make it more comprehensible. The function k_omega is basically applying your weighted mean function to a k and f window. The function windows_k_omega is just to speed up the loop to apply the function element wise on the windows. Finally, the outer function kw_omega implements your original function interface. It uses the numpy function sliding_window_view to create the moving windows (note that this is a fancy numpy indexing under the hood, so this is not creating a copy of the original array) and performs the calculation with the helper functions and takes care of the padding of the result array (initial zeros).

A short test with your original function showed some different results, which is likely due to your np.flip calls reverse the arrays for your indexing. I just implemented the formula which you provided without checking your indexing in depth, so I leave this task to you. You should maybe call it with some dummy inputs which you can check manually.

As an additional note to your code: If you want to loop on index, you should use the build in range instead of using np.arange. Internally, python uses a generator for range instead of creating the array of indexes first to iterate over each individually. Furthermore, you should try to reduce the number of arrays which you need to create, but instead re-use them, e.g. the omega = np.zeros(n) could be created outside the outer for loop using omega = np.empty(n) and internally only initialized on each new iteration omega[:] = 0.0. Note, that all kind of memory management which is typically the speed penalty, beside array element access by index, is something which you need to do with numpy yourself, because there is no compiler which helps you, therefore I recommend using numba, which compiles your python code and helps you in many ways to make your number crunching faster.