Most significant input dimensions for GPy.GPCoregionalizedRegression?

I have trained successfully a multi-output Gaussian Process model using an GPy.models.GPCoregionalizedRegression model of the GPy package. The model has ~25 inputs and 6 outputs.

The underlying kernel is an GPy.util.multioutput.ICM kernel consisting of an RationalQuadratic kernel GPy.kern.RatQuad and the GPy.kern.Coregionalize Kernel.

I am now interested in the feature importance on each individual output. The RatQuad kernel provides an ARD=True (Automatic Relevance Determination) keyword, which allows to get the feature importance of its output for a single output model (which is also exploited by the get_most_significant_input_dimension() method of the GPy model).

However, calling the get_most_significant_input_dimension() method on the GPy.models.GPCoregionalizedRegression model gives me a list of indices I assume to be the most significant inputs somehow for all outputs.

How can I calculate/obtain the lengthscale values or most significant features for each individual output of the model?

Solution

The problem is the model itself. The intrinsic coregionalized model (ICM) is set up such, that all outputs are determined by a shared underlying "latent" Gaussian Process. Thus, calling get_most_significant_input_dimension() on a GPy.models.GPCoregionalizationRegression model can only give you one set of input dimensions significant to all outputs together.

The solution is to use a GPy.util.multioutput.LCM model kernel, which is defined as a sum of ICM kernels with a list of individual (latent) GP kernels. It works as follows

import GPy

# Your data
# x = ...
# y = ...

# # ICM case
# kernel = GPy.util.multioutput.ICM(input_dim=x.shape[1],
#                                   num_outputs=y.shape[1],                                                   
#                                   kernel=GPy.kern.RatQuad(input_dim=x.shape[1], ARD=True))

# LCM case
k_list = [GPy.kern.RatQuad(input_dim=x.shape[1], ARD=True) for _ in range(y.shape[1])]
kernel = GPy.util.multioutput.LCM(input_dim=x.shape[1], num_outputs=y.shape[1],
                                              W_rank=rank, kernels_list=k_list)

A reshaping is of the data is needed (This is also necessary for the ICM model and thus independent of the scope of this questions, see here for details)

# Reshaping data to fit GPCoregionalizedRegression  
xx = reshape_for_coregionalized_regression(x)
yy = reshape_for_coregionalized_reshaping(y)

m = GPy.models.GPCoregionalizedRegression(xx, yy, kernel=kernel)
m.optimize()

After converged optimization one can call get_most_significant_input_dimension() on an individual latent GPs (here output 0).

sig_inputs_0 = m.sum.ICM0.get_most_significant_input_dimensions()

or looping over all kernels

sig_inputs = []
for part in self.gpy_model.kern.parts:
    sig_inputs.append(part.get_most_significant_input_dimensions())