Search code examples
pythonscikit-learnlasso-regression

Creating a DF inside of the lasso model


In this project I am running a lasso model:

def build_and_fit_lasso_model(X, y):
    """Creates and returns a LASSO model that is fitted to the values of the
    given predictor and target X, and y.
    """
    model = LassoLarsCV(cv=10, precompute = False)  
    model = model.fit(X_train.values, y_train.values)
    return model

lasso_model = build_and_fit_lasso_model(X_train, y_train)
lasso_model

after running it I want to create a function that returns a dataframe with variable names and coefficients inside of the fit lasso model. Here is the code that I have.

def get_coefficients(model, X):
    """Returns a DataFrame containing the columns `label` and `coeff` which are
    the coefficients by column name.
    """
    predictors_model = pd.DataFrame(filtered_data)#filtered_data is the name of the df used in the model
    predictors_model.columns = ['label']
    predictors_model['coeff'] =  model.coef_ 
    return predictors_model

When I running this code:

coefficients = get_coefficients(lasso_model, X)

I am getting an error "ValueError: Length mismatch: Expected axis has 19 elements, new values have 1 elements"


Solution

  • You get that error because 1. in the code X is specified but not used, and 2. the dimensions are wrong, you are specifiying a data.frame that is as long as your input data. So let's say your data is like this:

    from sklearn.linear_model import LassoLarsCV
    def build_and_fit_lasso_model(X, y):
    
        model = LassoLarsCV(cv=10, precompute = False)  
        model = model.fit(X_train.values, y_train.values)
        return model
    
    df = pd.DataFrame(np.random.normal(0,1,(50,5)),columns=['x1','x2','x3','x4','x5'])
    df['y']  = np.random.normal(0,1,50)
    
    X_train = df[['x1','x2','x3','x4','x5']]
    y_train = df['y']
    lasso_model = build_and_fit_lasso_model(X_train, y_train)
    

    A quick way is to put the coefficients into the data.frame and add the names as another column :

    def get_coefficients(model,X):
    
        predictors_model = pd.DataFrame({'label':X.columns,'coeff':model.coef_})
        return predictors_model
    
    get_coefficients(lasso_model,X_train)
    
        label   coeff
    0   x1  0.0
    1   x2  0.0
    2   x3  0.0
    3   x4  0.0
    4   x5  0.0