matlab statistics regression linear-regression glm

Matlab - Stepwise GLM with Categoricals

I have a table of 85 predictors, some of which are numerical, logical, ordinal and nominal (hot-one encoded). They are predicting a single finalScore outcome var which ranges from 0 to 1. I'm running a stepwise GLM using:

% model2 = stepwiseglm(predictors, finalScore);

Each predictor's header indicates which of the four types it is and I'm wondering if there is a way to tell the model that there are these different types. This page suggests there is for categoricals but so far I have not found anything within each of the 4 types I have.

Solution

Per Generalized Linear Models walk-through

For a table or dataset array tbl, fitting functions assume that these data types are categorical

Logical

Categorical (nominal or ordinal)

Character array

As long as the data is represented by the appropriate types in the input table, you shouldn't have to specify any further. To ensure this you can typecast nominal with categorical(), ordinal with ordinal(), and logical with logical().

You can specify categorical vs non-categorical with stepwiseglm(...'CategoricalVars',[0 1 0 1 0 0 0 ...]); but if you typecast your input correctly this should be redundant anyways.

Once the model is built, you can verify that categorical variables and ranges are handled appropriately by checking model2.VariableInfo