Search code examples
pythonscikit-learnlogistic-regressionmatrix-inversesingular

What causes the singular matrix error in kernel ridge regression and how to fix it?


When building a KTBoost model, I got the following error message:

1402                             K.flat[::K.shape[0] + 1] += self.alphaReg
1403                             self.solve_kernel=linalg.inv(K)
1404  modi = 
KernelRidge(alpha=self.alphaReg,theta=self.theta,kernel_mat=self.kernel_mat,
1405      solve_kernel=self.solve_kernel,kernel=self.kernel,n_neighbors=self.n_neighbors


/data/anaconda3/lib/python3.7/site-packages/scipy/linalg/basic.py in inv(a, overwrite_a, 
check_finite)
977         inv_a, info = getri(lu, piv, lwork=lwork, overwrite_lu=1)
978     if info > 0:
979         raise LinAlgError("singular matrix")
980     if info < 0:
981         raise ValueError('illegal value in %d-th argument of internal '

LinAlgError: singular matrix

I understand that it is usually caused by duplicated or highly correlated variables so excluded such variables before feeding the dataset to the function; however, the problem still happened.

In addition, adding a small random noise to the dataset did not work; however, if I created a dummy variable which had random numbers between (0, 1) and appended it to the dataset, the problem disappeared and the model ran well.

Could someone give me an insight on what caused this issue, please?


Solution

  • A singular matrix is very likely due to features that are very close to each other (or duplicates). I assume that you have tried adding noise to the features since adding noise to the labels does not help with this. In any case, it is also possible to have duplicate features but one then needs to add some regularization using the parameter 'alphaReg'. This values is added to the diagonal of the kernel matrix and thus helps to avoid singular matrices.

    What value have you set for the regularization parameter 'alphaReg'? Until version 0.1.18, the default values was wrongly set to 0 instead of 1, which is what the documentation says and also what scikit-learn is using. I have corrected this now. Can you please check whether the error still occurs when using KTBoost version >= 0.1.19?

    If this does not solve the issue, can you please provide a minimal working example with data and code to reproduce the error? Otherwise, it is difficult to tell what is happening.

    In the future, you might also open an issue on https://github.com/fabsig/KTBoost. It will be answered faster there.