Search code examples
pythonnumpymachine-learningscikit-learngaussian-process

Error in using Gaussian Process regression in sklearn python


I am started learning python and trying to implement Gaussian regression using Sklearn library. I tried to follow the examples available here for my own data points. However, I am getting the following example when I am trying to run y_pred, std = model.predict(X_te, return_std=True) this line of code of my problem. The error I got 'XA and XB must have the same number of columns (i.e. feature dimension.)'.

I don't know where I made my mistake, please help and thanks in advance.

The sample of input and output data is given as follows

X_tr= [10.8204  7.67418 7.83013 8.30996 8.1567  6.94831 14.8673 7.69338 7.67702 12.7542 11.847] 
y_tr= [1965.21  854.386 909.126 1094.06 1012.6  607.299 2294.55 866.316 822.948 2255.32 2124.67]
X_te= [7.62022  13.1943 7.76752 8.36949 7.86459 7.16032 12.7035 8.99822 6.32853 9.22345 11.4751]

X_tr, y_tr and X_te are the training data points and are reshape values and have a type of 'Array of float64'

Here is a sample of my code:

import sklearn.gaussian_process as gp

kernel = gp.kernels.ConstantKernel(1.0, (1e-1, 1e3)) * gp.kernels.RBF(10.0, (1e-3, 1e3))

model = gp.GaussianProcessRegressor(kernel=kernel, n_restarts_optimizer=10, alpha=0.1, normalize_y=True)

# data reshape
X_tr = X_tr.values.reshape(1,-1)
y_tr = y_tr.values.reshape(1,-1)

model.fit(X_tr, y_tr)
params = model.kernel_.get_params()

X_te = X_te.values.reshape(1,-1)

y_pred, std = model.predict(X_te, return_std=True)

Solution

  • This works. I changed your data from pandas to numpy arrays and fixed your reshapeing issues from which your error resulted.

    import numpy as np
    
    X_tr= np.array([10.8204,  7.67418, 7.83013, 8.30996, 8.1567,  6.94831, 14.8673, 7.69338, 7.67702, 12.7542, 11.847])
    y_tr= np.array([1965.21,  854.386, 909.126, 1094.06, 1012.6,  607.299, 2294.55, 866.316, 822.948, 2255.32, 2124.67])
    X_te= np.array([7.62022, 13.1943, 7.76752, 8.36949, 7.86459, 7.16032, 12.7035, 8.99822, 6.32853, 9.22345, 11.4751])
    
    import sklearn.gaussian_process as gp
    
    kernel = gp.kernels.ConstantKernel(1.0, (1e-1, 1e3)) * gp.kernels.RBF(10.0, (1e-3, 1e3))
    
    model = gp.GaussianProcessRegressor(kernel=kernel, n_restarts_optimizer=10, alpha=0.1, normalize_y=True)
    
    # data reshape
    X_tr = X_tr.reshape(-1,1)
    y_tr = y_tr
    
    model.fit(X_tr, y_tr)
    params = model.kernel_.get_params()
    
    X_te = X_te.reshape(-1,1)
    
    y_pred, std = model.predict(X_te, return_std=True)