Search code examples
pythonscikit-learngaussian-process

How to fix ConvergenceWarning in Gaussian process regression in sklearn?


I am trying to use fit a sklearn Gaussian process regressor to my data. The data has periodicity but no mean trend, so I defined a kernel similarly to the tutorial on the Mauna Loa data, without the long term trend, as follows:

from sklearn.gaussian_process.kernels import (RBF, ExpSineSquared, 
                                              RationalQuadratic, WhiteKernel)
from sklearn.gaussian_process import GaussianProcessRegressor as GPR
import numpy as np

# Models the periodicity
seasonal_kernel = (
    2.0**2
    * RBF(length_scale=100.0, length_scale_bounds=(1e-2,1e7))
    * ExpSineSquared(length_scale=1.0, length_scale_bounds=(1e-2,1e7), 
                     periodicity=1.0, periodicity_bounds="fixed")
)

# Models small variations
irregularities_kernel = 0.5**2 * RationalQuadratic(length_scale=1.0, 
                                length_scale_bounds=(1e-2,1e7), alpha=1.0)

# Models noise
noise_kernel = 0.1**2 * RBF(length_scale=0.1, length_scale_bounds=(1e-2,1e7)) + \
    WhiteKernel(noise_level=0.1**2, noise_level_bounds=(1e-5, 1e5)
)

co2_kernel = (
    seasonal_kernel + irregularities_kernel + noise_kernel
)

Then I use the kernel to define a regressor and fit the data:

gpr = GPR(n_restarts_optimizer=10, kernel=co2_kernel, alpha=150, normalize_y=False)
for x,y in zip(x_list, y_list):
    gpr.fit(x,y)

However, during fit I get multiple ConvergenceWarnings. They all look like the following:

C:\Users\user\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\gaussian_process\kernels.py:430: ConvergenceWarning: The optimal value found for dimension 0 of parameter k1__k2__k1__constant_value is close to the specified upper bound 100000.0. Increasing the bound and calling fit again may find a better value.
C:\Users\user\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\gaussian_process\kernels.py:430: ConvergenceWarning: The optimal value found for dimension 0 of parameter k2__k1__k1__constant_value is close to the specified upper bound 100000.0. Increasing the bound and calling fit again may find a better value.
C:\Users\user\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\gaussian_process\kernels.py:430: ConvergenceWarning: The optimal value found for dimension 0 of parameter k1__k2__k2__alpha is close to the specified upper bound 100000.0. Increasing the bound and calling fit again may find a better value.
C:\Users\user\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\gaussian_process\kernels.py:430: ConvergenceWarning: The optimal value found for dimension 0 of parameter k1__k1__k1__k1__constant_value is close to the specified upper bound 100000.0. Increasing the bound and calling fit again may find a better value.
C:\Users\user\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\gaussian_process\kernels.py:420: ConvergenceWarning: The optimal value found for dimension 0 of parameter k1__k1__k1__k2__length_scale is close to the specified lower bound 0.01. Decreasing the bound and calling fit again may find a better value.
C:\Users\user\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\gaussian_process\kernels.py:430: ConvergenceWarning: The optimal value found for dimension 0 of parameter k1__k2__k1__constant_value is close to the specified upper bound 100000.0. Increasing the bound and calling fit again may find a better value.

I managed to fix some of them by blanket adding the length_scale_bounds arguments to all of the functions within the kernel, but I'm not sure if I've set overextended bounds which needlessly degrade execution time for parts of the kernel that were running just fine, and I don't know how to remediate to the problem with alpha nor the constant values. Looking the errors online does not provide any help.

I know that the model is not being fitted properly because the Gaussian process regressor is performing far worse than a simple SVR, despite the latter being much faster. Does anybody know how I can:

  1. Associate each warning to a specific subkernel within the wider kernel?
  2. How do I fix the warning for alpha and constant value?

Solution

  • It took me a while but I found the solution in the documentation for the hyperparameter kernel API. The hyperparameter set for the whole kernel can be shown as follows:

    for hp in co2_kernel.hyperparameters:
        print('co2',hp)
    

    which outputs the following:

    co2 Hyperparameter(name='k1__k1__k1__k1__constant_value', value_type='numeric', bounds=array([[1.e-05, 1.e+05]]), n_elements=1, fixed=False)
    co2 Hyperparameter(name='k1__k1__k1__k2__length_scale', value_type='numeric', bounds=array([[1.e-05, 1.e+05]]), n_elements=1, fixed=False)
    co2 Hyperparameter(name='k1__k1__k2__length_scale', value_type='numeric', bounds=array([[1.e-05, 1.e+05]]), n_elements=1, fixed=False)
    co2 Hyperparameter(name='k1__k1__k2__periodicity', value_type='numeric', bounds='fixed', n_elements=1, fixed=True)
    co2 Hyperparameter(name='k1__k2__k1__constant_value', value_type='numeric', bounds=array([[1.e-05, 1.e+05]]), n_elements=1, fixed=False)
    co2 Hyperparameter(name='k1__k2__k2__alpha', value_type='numeric', bounds=array([[1.e+02, 1.e+07]]), n_elements=1, fixed=False)
    co2 Hyperparameter(name='k1__k2__k2__length_scale', value_type='numeric', bounds=array([[1.e-05, 1.e+05]]), n_elements=1, fixed=False)
    co2 Hyperparameter(name='k2__k1__k1__constant_value', value_type='numeric', bounds=array([[1.e-05, 1.e+05]]), n_elements=1, fixed=False)
    co2 Hyperparameter(name='k2__k1__k2__length_scale', value_type='numeric', bounds=array([[1.e-05, 1.e+05]]), n_elements=1, fixed=False)
    co2 Hyperparameter(name='k2__k2__noise_level', value_type='numeric', bounds=array([[1.e-09, 1.e+01]]), n_elements=1, fixed=False)
    

    The parameters relate to arguments of the various pieces of the kernel. As the documentation points out, "Note that due to the nested structure of kernels (by applying kernel operators, see below), the names of kernel parameters might become relatively complicated. In general, for a binary kernel operator, parameters of the left operand are prefixed with k1__ and parameters of the right operand with k2__.". The bifurcation are considered starting from the rightmost, according to the order of precedence of the operations.

    For example, the hyperparameters for the seasonal kernel start with k1__k1__ because to get there we need to take the left operand of both the outer additions, first the one between (seasonal_kernel + irregularities_kernel) and noise_kernel, and then the one between seasonal_kernel and irregularities_kernel. Here we can take the left operand both times to get to the 2.0**2 (which gets transformed to a ConstantKernel), which has one hyperparameter k1__k1__k1__k1__constant_value, or take first the left operand and then the right to get to the RBF kernel, which has the parameter k1__k1__k1__k2__length_scale. Another example: the parameter k2__k2__noise_level is the one relative to the noise level in the WhiteKernel within noise_kernel, because you can get there by first taking the right operand in the addition between (seasonal_kernel + irregularities_kernel) and noise_kernel, then the right operand again in the addition within noise_kernel.

    This feels impossibly complicated at first but gets easier pretty quickly. Once we know which parameters within which kernels are problematic, we can sort the problem by extending the corrisponding _bounds variable accordingly. For example, I could solve the first error by replacing 0.5**2 with ConstantKernel(constant_value=1,constant_value_bounds =(1e3,1e6)).