Search code examples
pythonscikit-learnregressionscipy-optimize

Multivariate second order polynomial regression python


I am dealing with multivariate regression problems. My dataset is something like X = (nsample, nx) and Y = (nsample, ny). nx and ny may vary based on different dataset of different case to study, so they should be general in the code.

I would like to determine the coefficients for the multivariate polynomial regression minimizing the root mean square error. I thought to split the problem in ny different regressions, so for each of them my dataset is X = (nsample, nx) and Y = (nsample, 1). So, for each depended variable (Uj) the second order polynomial has the following form:

enter image description here

I coded the function in python as:

def func(x,nx,pars0,pars1,pars2):
  y = pars0 #pars0 = bias
  for i in range(nx):
    y = y + pars1[i]*x[i] #pars1 linear coeff (beta_i in the equation)
    for j in range(nx):
        if (j < i ):
            continue
        y = y + pars2[i,j]*x[i]*x[j] 
        #diag pars2 =  coeff of x^2 (beta_ii in the equation)
        #upper triangle pars2 = coeff of x_i*x_k (beta_ik in the equation)
  return y

and the root mean square error as:

def resid(nsample,nx,pars0,pars1,pars2,x,y):
  res=0.0
  for i in range(nsample):
    y_pred = func(nx,pars0,pars1,pars2,x[i])
    res=res+((y_pred - y[i]) ** 2)
  res=res/nsample
  res=res**0.5
  return res

To determine the coefficients I thought to use scipy.optmize.minimize but it does not work example_1 example_2. Any ideas or advices? Should I use sklearn?

-> EDIT: Toy test data nx =3, ny =1

0.20    -0.02   0.20    1.0229781
0.20    -0.02   0.40    1.0218807
0.20    -0.02   0.60    1.0220439
0.20    -0.02   0.80    1.0227083
0.20    -0.02   1.00    1.0237960
0.20    -0.02   1.20    1.0255770
0.20    -0.02   1.40    1.0284888
0.20    -0.06   0.20    1.0123552
0.24    -0.02   1.40    1.0295350
0.24    -0.06   0.20    1.0125935
0.24    -0.06   0.40    1.0195798
0.24    -0.06   0.60    1.0124632
0.24    -0.06   0.80    1.0131748
0.24    -0.06   1.00    1.0141751
0.24    -0.06   1.20    1.0153533
0.24    -0.06   1.40    1.0170036
0.24    -0.10   0.20    1.0026915
0.24    -0.10   0.40    1.0058125
0.24    -0.10   0.60    1.0055921
0.24    -0.10   0.80    1.0057868
0.24    -0.10   1.00    1.0014004
0.24    -0.10   1.20    1.0026257
0.24    -0.10   1.40    1.0024578
0.30    -0.18   0.60    0.9748765
0.30    -0.18   0.80    0.9753220
0.30    -0.18   1.00    0.9740970
0.30    -0.18   1.20    0.9727272
0.30    -0.18   1.40    0.9732258
0.30    -0.20   0.20    0.9722360
0.30    -0.20   0.40    0.9687567
0.30    -0.20   0.60    0.9676569
0.30    -0.20   0.80    0.9672319
0.30    -0.20   1.00    0.9682354
0.30    -0.20   1.20    0.9674461
0.30    -0.20   1.40    0.9673747
0.36    -0.02   0.20    1.0272033
0.36    -0.02   0.40    1.0265790
0.36    -0.02   0.60    1.0271688
0.36    -0.02   0.80    1.0277286
0.36    -0.02   1.00    1.0285388
0.36    -0.02   1.20    1.0295619
0.36    -0.02   1.40    1.0310734
0.36    -0.06   0.20    1.0159603
0.36    -0.06   0.40    1.0159753
0.36    -0.06   0.60    1.0161890
0.36    -0.06   0.80    1.0153346
0.36    -0.06   1.00    1.0159790
0.36    -0.06   1.20    1.0167520
0.36    -0.06   1.40    1.0176916
0.36    -0.10   0.20    1.0048287
0.36    -0.10   0.40    1.0034699
0.36    -0.10   0.60    1.0032798
0.36    -0.10   0.80    1.0037224
0.36    -0.10   1.00    1.0059301
0.36    -0.10   1.20    1.0047114
0.36    -0.10   1.40    1.0041287
0.36    -0.14   0.20    0.9926268
0.40    -0.08   0.80    1.0089013
0.40    -0.08   1.20    1.0096265
0.40    -0.08   1.40    1.0103305
0.40    -0.10   0.20    1.0045464
0.40    -0.10   0.40    1.0041031
0.40    -0.10   0.60    1.0035650
0.40    -0.10   0.80    1.0034553
0.40    -0.10   1.00    1.0034699
0.40    -0.10   1.20    1.0030276
0.40    -0.10   1.40    1.0035284
0.40    -0.10   1.60    1.0042166
0.40    -0.14   0.20    0.9924336
0.40    -0.14   0.40    0.9914971
0.40    -0.14   0.60    0.9910082
0.40    -0.14   0.80    0.9903772
0.40    -0.14   1.00    0.9900816

Solution

  • After some trial and error, I finally came up with a solution. The problem can be seen as linear using a change of variables. I used scikit-learn to build the model. After some tests on real cases it works really well