Search code examples

How to view specific rows that my logistic regression has classified

I am currently working on a logistic regression model to predict the outcome of certain trades. This model classifies trades in the test set as good/bad (1/0). I want to see which trades are being classified in each group and multiply the trades classified as "good" by its profit/loss to find out if the logistic regression model is actually profitable. Is there any way I am able to view row-specific info of the entries that the model classifies as True/False?

This is what my code looks like for my data scaling and splitting into train/test set:

x = df[x_train_features]
y = df["y"]
y = y.astype("int")

# scale data
scaler = MinMaxScaler()
scaledx = scaler.fit_transform(x)

# split training data into test and training sets
X_train, X_test, y_train, y_test = train_test_split(scaledx, y, test_size=0.25)

# instantiate the model (using the default parameters)
logreg = LogisticRegression()

# fit the model with data, y_train)

y_pred_test = (logreg.predict_proba(X_test)[:, 1] >= 0.5).astype(bool)

I tried to use df.loc[y_pred_test == True], but I get the error:

Boolean index has wrong length: 720 instead of 2880

most likely because the test set is smaller than the whole sample set.


  • The error is because you haven't concatenated your prediction values with the df. You might try this:

    y_pred_test = pd.DataFrame(y_pred_test)
    X_test = pd.concat([y_test, y_pred_test], axis =1) 

    This will combine your prediction values with the ground truth. Then you can try the following:

    X_test.iloc[y_pred_test == True]

    And as you haven't predicted on the whole dataset (df) that's why you are getting the error that the number of rows in y_pred_test are 720 and not 2880.