Search code examples
rrpart

Getting error "variable lengths differ (found for 'columns_features')" in R


I am applying rpart function to a data frame named train having all the integer values. There are too many features so for that I have created a formula.

 columns_features <- (paste(colnames(train)[31:50], collapse = "+"))
 formulas <- as.formula(train$left_eye_center_x ~ columns_features)
 tree_pred <- rpart(formulas , data = train)

Here , I get the error message

 Error in model.frame.default(formula = formulas, data = train, na.action = function (x)  : variable lengths differ (found for 'columns_features')

When I check formulas it has

 train$left_eye_center_x ~ columns_features

and for column_features it has

[1] "l_1+ l_2+ l_3+ l_4+ l_5+ l_6+ l_7+ l_8+ l_9+ l_10+ l_11+ l_12+ l_13+ l_14+ l_15+ l_16+ l_17+ l_18+ l_19+ l_20"

For checking purpose when I manually enter the column names here, it works

 formulas <- as.formula(train$left_eye_center_x ~ l_1+ l_2+ l_3+ l_4+ l_5+ l_6+ l_7+ l_8+ l_9+ l_10+ l_11+ l_12+ l_13+ l_14+ l_15+ l_16+ l_17+ l_18+ l_19+ l_20  )
 tree_pred <- rpart(formulas , data = train)

Is double quote creating the error? What could be solution to this? I have many features so I cannot afford to enter each and every feature manually.


Solution

  • From the ?as.formula examples:

     ## Create a formula for a model with a large number of variables:
     xnam <- paste0("x", 1:25)
     (fmla <- as.formula(paste("y ~ ", paste(xnam, collapse= "+"))))
    

    Which implies that in your case the following should work:

    formulas <- as.formula(paste("train$left_eye_center_x ~", paste(colnames(train)[31:50], collapse = "+")))
    

    A work-around, instead of using your approach would be (NB: I never used rpart, but I am confident that this works):

    formulas <- as.formula(train$left_eye_center_x ~ .)  
    tree_pred <- rpart(formulas , data = train[,31:50])
    

    If rpart does not like getting indexed data you could define a new dataframe:

    train4rpart <- train[,31:50]
    tree_pred <- rpart(formulas , data = train4rpart)
    

    Actually, reading through ?rpart, you can skip the whole formula thing:

    tree_pred <- rpart(train$left_eye_center_x ~ . , data = train[,31:50])
    

    OR

    tree_pred <- rpart(train$left_eye_center_x ~ . , data = train4rpart)