python parallel-processing gradient-descent

Storing parameter values in every step of the custom gradient descent algorithm in Python

I'm trying to make custom gradient descent estimator, however, I am encountering the issue with storing the parameter values at every step of the gradient descent algorithm. Here is the code skeleton:

from numpy import *
import pandas as pd
from joblib import Parallel, delayed
from multiprocessing import cpu_count

ftemp = zeros((2, )) 
stemp = empty([1, ], dtype='<U10') 
la = 10

vals = pd.DataFrame(index=range(la), columns=['a', 'b', 'string']

def sfun(k1, k2, k3, string):
    a = k1*k2
    b = k2*k3
    s = string

    nums = [a, b]
    strs = [s]

    return(nums, strs)

def store(inp):
    r = rsfun(inp[0], inp[1], inp[2], inp[3])

    ftemp = append(ftemp, asarray(r[0]), axis = 0)
    stemp = append(stemp, asarray(r[1]), axis = 0)
    
    return(ftemp, stemp)

for l in range(la):
    inputs = [(2, 3, 4, 'he'),
              (4, 6, 2, 'je'), 
              (2, 7, 5, 'ke')]

    Parallel(n_jobs = cpu_count)(delayed(store)(i) for i in inputs)

    vals.iloc[l, 0:2] = ftemp[0, 0], ftemp[0, 1]
    vals.iloc[l, 2] = stemp[0]

    d = ftemp[2, 0]-ftemp[0, 0]

Note: most of the gradient descent stuff is removed because I do not have any issues with that. the main issues that I have are storing the values at each step.

sfun() is the loss function (I know that it doesn't look like that here) and store() is just an attempt to store the parameter values with each step.

The important aspect here is that I want to parallelize the process as sfun() is computationally expensive and the issue with that I want to save values for all parallel runs.

I tried solving this in many different ways, but I always get a different error.

Solution

No need to make a temporary storage array, possible to store the results of Parallel() function directly by:

a = Parallel(n_jobs = cpu_count)(delayed(store)(i) for i in inputs)

Most importantly, a is populated in order that the inputs are given.