Search code examples
machine-learning

Question in one scikit ML example, "MNIST classification using multinomial logistic + L1"


https://scikit-learn.org/stable/auto_examples/linear_model/plot_sparse_logistic_regression_mnist.html

I have 2 questions in this example:

Q1: In theory, it should be the more data, the less regulation. But "C=50.0 / train_samples" would lead to the more data with stronger regulation. Is it the deficit of the code, or my misundertanding? clf = LogisticRegression(C=50.0 / train_samples, penalty="l1", solver="saga", tol=0.1)

Q2: Why it is possible to draw the final number image with only the coef matrix, what's the ideas behind it?


Solution

  • C is Inverse of regularization strength; higher C means weaker regularization, lower C means stronger regularization. Given Formula: As train samples increases, C decreases, leading to stronger regularization. Reasoning: This is intentional, designed to prevent overfitting by applying more regularization when the dataset size increases.

    Coefficient Matrix (coef): Represents the importance of each pixel (feature) in predicting the class. Visualization: Reshaping coef into the image’s dimensions shows which pixels the model considers important, effectively "drawing" the digit as the model sees it.