Search code examples
rdata-manipulationpredictiondecision-tree

Obtaining Predictions for New Observations (R Programming Language)


I am working with the R programming language. I created a decision tree for this dataset in R (to predict whether the "diabetes" column is either "pos" or "neg"):

#load libraries
library(pdp)
library(C50)

#load data
data(pima)

#remove na's
new_data = na.omit(pima)

#format data
new_data$age = as.factor(ifelse(new_data$age >30, "1", "0"))
new_data$pregnant = as.factor(ifelse(new_data$pregnant >2, "1", "0"))

#run model
tree_mod <- C5.0(x = new_data[, 1:8], rules = TRUE, y = new_data$diabetes)

Here is my question: I am trying to obtain a column of "predictions" made by the model for new observations. I am then want to take this column and append it to the original dataset.

Using the following link, https://cran.r-project.org/web/packages/C50/vignettes/C5.0.html, I used the "predict" function:

#pretend this is new data
new = new_data[1:10,]

#run predictions
pred = predict(tree_mod, newdata = new[, 1:8])

But this produces the following error:

Error in x[j] : invalid subscript type 'closure'

Can anyone please show me how to do this?

I am trying to create something like this ("prediction_made_by_model"):

   pregnant glucose pressure triceps insulin mass pedigree age diabetes prediction_made_by_model
4         0      89       66      23      94 28.1    0.167   0      neg                      pos
5         0     137       40      35     168 43.1    2.288   1      pos                      neg
7         1      78       50      32      88 31.0    0.248   0      pos                      neg
9         0     197       70      45     543 30.5    0.158   1      pos                      pos
14        0     189       60      23     846 30.1    0.398   1      pos                      neg
15        1     166       72      19     175 25.8    0.587   1      pos                      pos

Thanks!


Solution

  • I was able to figure it out. For some reason, this was not working before:

    pred = predict(tree_mod, newdata = new[, 1:8])
    
    new$prediction_made_by_model = pred