python scikit-learn linear-regression lasso-regression

sklearn for LASSO (BIC tuning)

We encounter a problem when using the LASSO-related function in sklearn. Since the LASSO with BIC tuning just change the alpha, the results of LASSO with BIC (1) should be equivalent to the LASSO with fixed optimal alpha (2).

linear_model.LassoLarsIC
linear_model.Lasso

First, we could consider the simple DGP setting:

################## DGP ##################
np.random.seed(10)
T = 200     # sample size
p = 100     # number of regressors
X = np.random.normal(size = (T, p))
u = np.random.normal(size = T)
beta = np.hstack((np.array([5, 0, 3, 0, 1, 0, 0, 0, 0, 0]), np.zeros(p-10)))
y = np.dot(X, beta) + u

Then we use the LASSO with BIC. linear_model.LassoLarsIC

# LASSO with BIC
lasso = linear_model.LassoLarsIC(criterion='bic')
lasso.fit(X,y)
print("lasso coef = \n {}".format(lasso.coef_))
print("lasso optimal alpha = {}".format(lasso.alpha_))

lasso coef = 
 [ 4.81934044  0.          2.87574831  0.          0.90031582  0.
  0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.01705965  0.
  0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.
 -0.07789506  0.          0.05817856  0.          0.          0.
  0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.        ]
lasso optimal alpha = 0.010764484244859006

Then we use the optimal alpha here with LASSO. linear_model.Lasso

# LASSO with fixed alpha
clf = linear_model.Lasso(alpha=lasso.alpha_)
clf.fit(X,y)
print("lasso coef = \n {}".format(clf.coef_))

lasso coef = 
 [ 4.93513468e+00  5.42491624e-02  3.00412571e+00 -3.83394653e-02
  9.87262697e-01  5.21693412e-03 -2.89977454e-02 -1.40952930e-01
  5.18653123e-02 -7.66271662e-02 -1.99074552e-02  2.72228580e-02
 -1.01217167e-01 -4.69445223e-02  1.74378470e-01  2.52655725e-02
  1.84902632e-02 -7.11030674e-02 -4.15940817e-03  1.98229236e-02
 -8.81779536e-02 -3.59094431e-02  5.53212537e-03  9.23031418e-02
  1.21577471e-01 -4.73932893e-03  5.15459727e-02  4.17136419e-02
  4.49561794e-02 -4.74874460e-03  0.00000000e+00 -3.56968194e-02
 -4.43094631e-02  0.00000000e+00  1.00390051e-03  7.17980301e-02
 -7.39058574e-02  1.73139031e-02  7.88996602e-02  1.04325618e-01
 -4.10356303e-02  5.94564069e-02  0.00000000e+00  9.28354383e-02
  0.00000000e+00  4.57453873e-02  0.00000000e+00  0.00000000e+00
 -1.94113178e-02  1.97056365e-02 -1.17381604e-01  5.13943798e-02
  2.11245596e-01  4.24124220e-02  1.16573094e-01  1.19551223e-02
 -0.00000000e+00 -0.00000000e+00 -8.35210244e-02 -8.29230887e-02
 -3.16409003e-02  8.43274240e-02 -2.90949577e-02 -0.00000000e+00
  1.24697858e-01 -3.07120380e-02 -4.34558350e-02 -0.00000000e+00
  1.30491858e-01 -2.04573808e-02  6.72141775e-02 -6.85563204e-02
  5.64781612e-02 -7.43380132e-02  1.88610065e-01 -5.53155313e-04
  0.00000000e+00  2.43191722e-02  9.10973250e-02 -4.49945551e-02
  3.36006276e-02 -0.00000000e+00 -3.85862475e-02 -9.63711465e-02
 -2.07015665e-01  8.67164869e-02  1.30776709e-01 -0.00000000e+00
  5.42630086e-02 -1.44763258e-01 -0.00000000e+00 -3.29485283e-02
 -2.35245212e-02 -6.19975427e-02 -8.83892134e-03 -1.60523703e-01
  9.63008989e-02 -1.06953313e-01  4.60206741e-02  6.02880434e-02]
-0.06321829752708413

Two coefficients are different.

Why does this happen?

Solution

So the main difference I could find off the bat is the max_iter parameter, which is at 1000 with the Lasso model and at 500 with the LassoLarsIC model.

Other hyperparameters such as tol and selection are not adjustable in the LassoLarsIC implementation.

There might be more nuanced differences in the exact implementation of the two models though.