Search code examples
statsmodelscategorical-data

Python OLS with categorical label


I have a dataset where I am trying to predict the type of car based off of a number of features. I would like to an OLS regression to see

import statsmodels.api as sm

X  = features 
# where 0 = sedan, 1 = minivan , etc 
y = [0,0,1,0,2,....]

X2 = sm.add_constant(np.array(X))
est = sm.OLS(np.array(y), X2)
est2 = est.fit()

^ I don't feel like doing this is correct because I am not specifying that it is categorical, I feel like the functional form should change. Was wondering if anyone had any insight on this.


Solution

  • Ordinary least squares regression assumes a numerical dependent variable, you cannot use it to predict categorical outcomes.

    To predict categorical outcomes with a regression model, you want to use multinomial logistic regression, for example using sklearn.