Search code examples
pythonmachine-learningregressionnon-linear-regressiongpy

Using GPy Multiple-output coregionalized prediction


I have been facing a problem recently where I believe that a multiple-output GP might be a good candidate. I am at the moment applying a single-output GP to my data and as dimensionality increases, my results keep getting worse. I have tried multiple-output with SKlearn and was able to get better results for higher dimensions, however I believe that GPy is more complete for such tasks and I would have more control over the model. For the single-output GP I was setting the kernel as the following:

kernel = GPy.kern.RBF(input_dim=4, variance=1.0, lengthscale=1.0, ARD = True)
m = GPy.models.GPRegression(X, Y_single_output, kernel = kernel, normalizer = True) 
m.optimize_restarts(num_restarts=10)  

In the example above X has size (20,4) and Y(20,1).

The implementation that I am using to multiple-output I got from Introduction to Multiple Output Gaussian Processes I prepare the data accordingly to the example, setting X_mult_output to size (80,2) - with the second column being the input indices - and rearranging Y to (80,1).

kernel = GPy.kern.RBF(1,lengthscale=1, ARD = True)**GPy.kern.Coregionalize(input_dim=1,output_dim=4, rank=1)
m = GPy.models.GPRegression(X_mult_output,Y_mult_output, kernel = kernel, normalizer = True)

Ok, everything seems to work so far, now I want to predict the values. The problem so is that it seems that I am not able to predict the values. From what I understood, you can just predict a single output by specifying the input index on the Y_metadata argument. As I have 4 inputs, I set an array that I want to predict as the following:

x_pred = np.array([3,2,2,4])

Then, I imagine that I have to do separately the prediction of each value out of my x_pred array as shown in Coregionalized Regression Model (vector-valued regression) :

Y_metadata1 = {'output_index': np.array([[0]])}

y1_pred = m.predict(np.array(x[0]).reshape(1,-1),Y_metadata=Y_metadata1)

The problem is that I keep getting the following error:

IndexError: index 1 is out of bounds for axis 1 with size 1

Any suggestion about how to overcome that problem or is there any mistake on my implementation?

Traceback:

Traceback (most recent call last):

  File "<ipython-input-9-edb25bc29817>", line 36, in <module>
    y1_pred = m.predict(np.array(x[0]).reshape(1,-1),Y_metadata=Y_metadata1)

  File "c:\users\johndoe\desktop\modules\sheffieldml-gpy-v1.9.9-0-g92f2e87\sheffieldml-gpy-92f2e87\GPy\core\gp.py", line 335, in predict
    mean, var = self._raw_predict(Xnew, full_cov=full_cov, kern=kern)

  File "c:\users\johndoe\desktop\modules\sheffieldml-gpy-v1.9.9-0-g92f2e87\sheffieldml-gpy-92f2e87\GPy\core\gp.py", line 292, in _raw_predict
    mu, var = self.posterior._raw_predict(kern=self.kern if kern is None else kern, Xnew=Xnew, pred_var=self._predictive_variable, full_cov=full_cov)

  File "c:\users\johndoe\desktop\modules\sheffieldml-gpy-v1.9.9-0-g92f2e87\sheffieldml-gpy-92f2e87\GPy\inference\latent_function_inference\posterior.py", line 276, in _raw_predict
    Kx = kern.K(pred_var, Xnew)

  File "c:\users\johndoe\desktop\modules\sheffieldml-gpy-v1.9.9-0-g92f2e87\sheffieldml-gpy-92f2e87\GPy\kern\src\kernel_slice_operations.py", line 109, in wrap
    with _Slice_wrap(self, X, X2) as s:

  File "c:\users\johndoe\desktop\modules\sheffieldml-gpy-v1.9.9-0-g92f2e87\sheffieldml-gpy-92f2e87\GPy\kern\src\kernel_slice_operations.py", line 65, in __init__
    self.X2 = self.k._slice_X(X2) if X2 is not None else X2

  File "<decorator-gen-140>", line 2, in _slice_X

  File "C:\Users\johndoe\AppData\Roaming\Python\Python37\site-packages\paramz\caching.py", line 283, in g
    return cacher(*args, **kw)

  File "C:\Users\johndoe\AppData\Roaming\Python\Python37\site-packages\paramz\caching.py", line 172, in __call__
    return self.operation(*args, **kw)

  File "c:\users\johndoe\desktop\modules\sheffieldml-gpy-v1.9.9-0-g92f2e87\sheffieldml-gpy-92f2e87\GPy\kern\src\kern.py", line 117, in _slice_X
    return X[:, self._all_dims_active]

IndexError: index 1 is out of bounds for axis 1 with size 1




Solution

  • problem

    you have defined the kernel with X of dimention (-1, 4) and Y of dimension (-1, 1) but you are giving it X_pred of dimension (1, 1) (the first element of x_pred reshaped to (1, 1))

    solution

    give the x_pred to the model for prediction (an input with dimension of (-1, 4))

    Y_metadata1 = {'output_index': np.array([[0]])}
    y1_pred = m.predict(np.array(x_pred).reshape(1,-1), Y_metadata=Y_metadata1)
    

    DIY

    before executing your codes together try to run them seperatly and debug them easily, then you can make your code small and clean. the example below is the debug code of your problem

    Y_metadata1 = {'output_index': np.array([[0]])}
    a = np.array(x_pred[0]).reshape(1,-1)
    print(a.shape)
    y1_pred = m.predict(a,Y_metadata=Y_metadata1)
    

    the output is (1,1) and the error, which makes it obvious the error is from input dimension.

    Reading errors also help, your error says, in kern.K(pred_var, Xnew) there is a problem, so the error is probably from kernel, then it says its from X[:, self._all_dims_active] so the error is probably from X dimensions. then with a little experiment with x dimension you will get the idea.

    hopefully after 7 days this would help!