I am trying to use the tree Package in R-cran. I am loading my csv file as follows:
data <- read.csv("C:/data2.csv", header = FALSE, sep = ";",dec = ".")
The last column in the file represents the Class Is that correct?
My question is should the class be represents in the first or las column of the file thank you.
V1 V2 V3 V4 V5 CLASS
'X0000002' NULL 0 NULL 'BETA' 1
'Y0034195' NULL 2 NULL 'INTERNAL' 1
'X0000001' NULL 0 NULL 'BETA' 2
'X0000002' NULL 0 NULL 'BETA' 2
'X0000002' NULL 0 NULL 'BETA' 2
'Y0034195' NULL 0 NULL 'INTERNAL' 2
CORRECTION OK I Have 24 Descrptors V1...V24. V24 is the class. I used the Rpart library in R Cran
library(rpart)
data <- read.csv("C:/data2.csv", header = FALSE, sep = ";",dec = ".")
d1-> data[,1:24]
fit <- rpart(v24~ v1+v2+v3+v4+v5+v6+v7+v8+v9+v10+v11+v12+v13+v14+v15+v16+v17+v18+v19+V20+v21+v22+v23,data=d1)
Example : Solution
# Regression Tree Example
library(rpart)
data <- read.csv("C:/data2.csv", header = T,sep=";")
fit = rpart(linkId ~ .,method = "anova",data = data)
printcp(fit) # display the results
plotcp(fit) # visualize cross-validation results
summary(fit) # detailed summary of splits
# create additional plots
par(mfrow=c(1,2)) # two plots on one page
rsq.rpart(fit) # visualize cross-validation results
# plot tree
plot(fit, uniform=FALSE,
main="Regression Tree for Mileage ",compress = TRUE )
text(fit, use.n=TRUE, all=TRUE, cex=.8)
# create attractive postcript plot of tree
post(fit, file = "c:/tree2.ps",
title = "Arbre de Regression ")
Check the help with ?formula
, you should get the basics easily.
You need the names in your data frame, otherwise R cannot understand the formula. Also, you can use a shortcut to use all variables:
fit = rpart(CLASS ~ ., data = data)
Or
fit = rpart(data = data, formula = CLASS ~ .)
If you use a different order for the arguments, you should name them (there's no need to use the second way though).