From the documentation of help(rpart)
, there is a subset
option, which is an "optional expression saying that only a subset of the rows of the data should be used in the fit."
How exactly do I go about using this option?
library(rpart)
fit <- rpart(Kyphosis ~ Age + Number + Start,
data = kyphosis,
subset = sample(1:nrow(kyphosis), 20))
In the above code, I randomly sampled 20 row indices from the kyphosis
data. Is this the correct usage?
Yes, this is OK. With subset
, you can also:
data.frame
: subset=1:21
subset=(Age<50)