Search code examples
pythonpandasscipy-optimize

Loop over pandas data frame in order to solve equation with fsolve in python


I have a data frame from a csv input file as a data frame. I would like to loop over each row in the data frame and assign each column a variable. These variables are then included into an equation, that I would like to solve in the end by including values from each row of the data frame. In the end I would like to have an solved output with 4 values for each row in the data frame.

df = pd.read_csv('path/testfile.csv', delimiter='\t', header=None)
print(df)

           0         1         2         3         4         5
0   0.227996  0.337029  0.238164  0.183009  0.085747  0.134129
1   0.247891  0.335556  0.272129  0.187329  0.085921  0.128372
2   0.264761  0.337778  0.245918  0.183212  0.080493  0.122786
3   0.305061  0.337778  0.204265  0.208453  0.071558  0.083683
4   0.222749  0.337029  0.209715  0.084253  0.142014  0.234673
5   0.190816  0.337029  0.291872  0.041575  0.463764  0.053193
6   0.299625  0.337029  0.206064  0.200905  0.072955  0.092528
7   0.259740  0.340045  0.202792  0.156021  0.087506  0.148796

I have some variables:

for index, row in df.iterrows():
    K=df[0]
    L=df[1]
    M=df[2]
    N=df[3]
    P=df[4]
    F=df[5]
    H=1-K
    def f2(z):
      a=z[0]
      b=z[1]
      c=z[2]
      d=z[3]
      f=np.zeros(4)
      f[0]=K*a*((1-c)*L+(b-d)*M)-N
      f[1] =P+a*c*d*b
      f[2]= F+H*c*a*d+b
      f[3]= H+F+P*a*b*c*d
      return f
    z= fsolve(f2,[1,1,1,1])

print(z)

But I cannot manage to link the for loop to the equation in order to input each row of the data frame. In the end it only gives me 4 values. And not 4 values for each row.

Does somebody know how to do it?


Solution

  • Here is a way, I use the notation convention of your original question. I slightly rewrote your function, but same operation. Then for efficiency, I don't use iterrows, but unpack all the data and use a simple for loop. at the end, res is a list of array, each element is the solved answer for each row.

    # rewrite your function
    def f2_bis(z, Ai, Bi, Ci, Di, Ei, Fi, Hi):
        a, b, c, d = z
        return np.array([
            Ai*a*((1-c)*Bi+(b-d)*Ci)-Di,
            Ei+a*c*d*b,
            Fi+Hi*c*a*d+b,
            Hi+Fi+Ei*a*b*c*d
        ])
    
    # unpack the value of each column in different variable
    A, B, C ,D, E, F = df.to_numpy().T
    H = 1-A
    
    # get the result for each "row"
    res = [
        fsolve(f2_bis, [1,1,1,1], args=(Ai, Bi, Ci, Di, Ei, Fi, Hi))
        for Ai, Bi, Ci, Di, Ei, Fi, Hi in zip(A, B, C, D, E, F, H)
    ]
    res
    # [array([ 1.25482583, -0.32608114, -0.17861197, -0.98296457]),
    #  array([ 0.70750447, -0.41512857, -0.30218114, -1.80533338]),
    #  array([-2.91283478, -0.41076736,  1.41022472, -0.09615889]),
    #  array([ 2.87736785,  0.25256582,  0.6107988 , -0.26507222]),
    #  array([-0.532438  ,  0.34016552,  6.64918304,  0.18908195]),
    #  array([ 1.1062844 ,  0.73110855, -0.65958519,  1.32070547]),
    #  array([1., 1., 1., 1.]),
    #  array([1., 1., 1., 1.])]