Search code examples
castingdata-sciencelogistic-regressionmodeling

Error while converting continuous data to categorical data in Logistic Regression


I am using Logistic regression over my dataset which has its target variable in 0s and 1s. I used .replace() function and replaced them accordingly.

> data['target']=data['target'].replace({0:"No",1:"yes"})

The code ran fine. But when I am modelling the data,

model_log=sm.Logit(data['target'],data.iloc[:,2:]).fit()

it is showing the below error:

ValueError: Pandas data cast to numpy dtype of object. Check input data with np.asarray(data).


Solution

  • when you select X data using iloc,it is return a pandas dataframe.According to statsmodel documentation,logit expect to X and y to be array_like. You need to cast the dataframe to required data type.You can use to_numpy method to convert dataframe to numpy array.

    model_log=sm.Logit(data['target'].astype(float),data.iloc[:,2:].to_numpy()).fit()