Search code examples
rpredictlda

Carrying out an LDA and predict data


I had the following dataset

library(MASS) 
install.packages("gclus")
data(wine)
View(wine)
install.packages("car")

I wanted to split it according to the proportions 70:30 into a training and a test set. Also I wanted to carry out LDA for the following data subsets

wine[c("Class", "Malic", "Hue", "Magnesium")] 
wine[c("Class","Hue", "Alcalinity", "Phenols", "Malic", "Magnesium", "Intensity", "Nonflavanoid","Flavanoids")]

Lastly, I was using the function predict to predict the class memberships for the test data, and compare the predictions with the true class memberships.

I am getting some errors while doing it, so any help would be appreciated.


Solution

  • First split the data in train and test 70:30 like this:

    library(MASS) 
    library(gclus)
    set.seed(123)
    ind <- sample(2, nrow(wine),replace = TRUE, prob = c(0.7, 0.3))
    training <- wine[ind==1,]
    testing <- wine[ind==2,]
    

    Next, you can use the function lda to perform a Linear discriminant analysis like this:

    model1 <- lda(Class ~ Malic + Hue + Magnesium, training)
    model2 <- lda(Class ~ Hue + Alcalinity + Phenols + Malic + Magnesium + Intensity + Nonflavanoid + Flavanoids, training)
    

    At last you can predict on testset and check the results with a confusion matrix like this:

    p1 <- predict(model1, testing)$class
    tab <- table(Predicted = p1, Actual = testing$Class)
    tab
    

    Output:

             Actual
    Predicted  1  2  3
            1 13  3  0
            2  5 14  0
            3  0  2 11
    

    The accuracy is:

    cat("Accuracy is:", sum(diag(tab))/sum(tab))
    
    Accuracy is: 0.7916667