Search code examples
rmachine-learningone-hot-encoding

Why do we mention -1 in model.matrix function in R ? is it for one hot encoding or does it have any other reason?


Why do we mention -1 in the formula of model.matrix function from the statspackage.

training_matrix <-model.matrix(Survived ~.-1, data = training)

The standard titanic dataset is used in this case.

There is also documentation that says that one hot encoding can be performed using model.matrix with -1 notation, provided we have declared the factors and numeric in the dataset properly.

The code is as follows

data_1_matrix <-model.matrix(~.-1, data = data_1)

What does this -1 do exactly?


Solution

  • The -1 ensures there is no constant in your model matrix. If you would use

    training_matrix <-model.matrix(Survived ~., data = training)
    

    There is a column of ones included and one category is omitted in the model matrix, to ensure your model will not suffer from multicollinearity.

    It is up to the user what is preferable: If you use a constant, there will be a 'reference class' in your model. If you don't, there is no reference class.