Search code examples
pythonscikit-learnrocsklearn-pandas

error in calculating AUC ROC in python


I am implementing linear regression in python using sklearn.

I have successfully trained model using linear_model.LinearRregression() function.

Now, I want to measure goodnessoffit of the model using AUC ROC method. I am using following code for doing the same :

train_set[predictors1], train_set["loan_status"] = make_classification(n_samples=4000, n_features=2, n_redundant=0, flip_y=0.4)
train, test, train_t, test_t = train_test_split(train_set[predictors1], train_set["loan_status"], train_size=0.9)

rf.fit(train, train_t)

But, getting error in line 1 as below :

ValueError: Must have equal len keys and value when setting with an ndarray


Solution

  • Documentation for make_classification says the following

    Returns:
    X : array of shape [n_samples, n_features] The generated samples.

    y : array of shape [n_samples] The integer labels for class membership of each sample.

    looks like the issues is that X is a list with two arrays and you are attempting to assign both those arrays to one column on your pandas dataframe. You need to isolate which array you want then assign it to the desired column.

    _X, df['loan_status'] = make_classification()
    df['my_col'] = _X[0]
    # or
    df['my_col'] = _X[1]