Search code examples
pythonsklearn-pandas

Python Logistic Regression Y Value Issues


I'm currently getting a mixture of the following errors:

  • ValueError: Unknown label type: 'unknown'
  • ValueError: Expected 2D array, got 1D array instead: array=[0. 0. 0. ... 1. 1. 1.]. Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
  • TypeError: 'tuple' object is not callable

When I search for others who have had the same issue, the answer usually leads me from one of the above errors to another. Below is a screenshot of my code. Lines 7-9 are the solutions I found for my errors that just lead to different errors. Comment out line 8 or 9 or both and it gives you the wrong shape error. Comment out all three and you get the label type unknown error.

For line 7 I have tried bool, int, and float.

enter image description here

df.loc[df['ReAdmis'] == 'No', 'ReAdmis'] = "False"
df.loc[df['ReAdmis'] == 'Yes', 'ReAdmis'] = "True"

log_ra = df['ReAdmis']

print(log_ra.head)
log_ra=log_ra.astype('bool')
# log_ra=log_ra.to_numpy()
log_ra=log_ra.reshape(-1,1)


model = LogisticRegression(solver='liblinear')
logistic_regression = model.fit(df1,log_ra)
model.score(log_ra, df1)

I am using masks to convert Yes/No to 1/0 for my Y value, is that what is causing this issue? I found a lot of great articles when I was working on the multiple regression version of this, but it seems like logistic regression is less used and I'm not finding as many helpful articles on it.


Solution

  • Line 9: In your code, please note that shape is a tuple and a property of the DataFrame object, i.e., you cannot call it but only access it; see https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.shape.html

    Maybe you wanted to use reshape there?

    Line 7: astype(float) changes the type of the columns to float (see https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.astype.html); if you want to replace Yes and No with True and False respectively, you could set it as such on line 1 and 2. After that, you can use df = df.astype(bool) to set the type to bool.

    Example:

    >>> df = pd.DataFrame([{"ReAdmis": "No"}, {"ReAdmis": "Yes"}])
    >>> df[df["ReAdmis"] == "Yes"] = True
    >>> df[df["ReAdmis"] == "No"] = False
    >>> # The dtype of the ReAdmins column is object; need to explicitly convert it to bool
    >>> df = df.astype(bool)
    >>> print(df)
      ReAdmis
    0    False
    1    True
    >>> print(df.dtypes)
    ReAdmis    bool
    dtype: object