Search code examples
rdecision-treerpart

R: how to use the subset option in rpart?


From the documentation of help(rpart), there is a subset option, which is an "optional expression saying that only a subset of the rows of the data should be used in the fit."

How exactly do I go about using this option?

library(rpart)
fit <- rpart(Kyphosis ~ Age + Number + Start,
             data = kyphosis,
             subset = sample(1:nrow(kyphosis), 20))

In the above code, I randomly sampled 20 row indices from the kyphosis data. Is this the correct usage?


Solution

  • Yes, this is OK. With subset, you can also:

    • Explicitly pick rows of your data.frame: subset=1:21
    • Pick rows based on variable(s) value(s): subset=(Age<50)