I have a multiple layer Neural Network, that uses two variables from the database (Alcohol and Malic.Acid)
My Code
#Reading in the wine data from last week's labs
winedata = read.csv('/Users/ali/Documents/CS3002/Lab2/winedata2.csv', header = TRUE, sep=",")
#Setting up my test and train data
winevaluesTrain = winedata[1:65,2:3]
wineclassesTrain = winedata[1:65,1]
winevaluesTest = winedata[66:130,2:3]
wineclassesTest = winedata[66:130,1]
#normalize
scaledtrain <- as.data.frame(scale(winevaluesTrain))
scaledtest <- as.data.frame(scale(winevaluesTest))
#Building the architecture of my neural network
set.seed(2)
NN2 = neuralnet(wineclassesTrain~., scaledtrain, hidden = c(3,3) , threshold = 0.001, stepmax = 1e+05, linear.output = FALSE)
plot(NN2)
predict_testNN2 = compute(NN2, scaledtest)
predict_outNN2 = predict_testNN2$net.result
print(predict_outNN2)
which plots the Neural Network
and prints out the predicted results
the final part is to 'Calculate the Accuracy' and I'm not sure where I need to go from here? asking the team they said I'm looking for a single value that shows the accuracy of my Neural Network
not sure if I need a confusion matrix? or how to present a single accuracy score
I can't see your data, but I'm assuming this is a classification problem (you're predicting a binary outcome), and the predicted results you show above are probabilities produced by your neural network model.
The next step would be to apply a decision threshold to those probabilities. i.e. which of them should be 1 and which should be 0. e.g. you could keep the ratio of 0:1 the same as in your training data. Note that applying decisions to probabilities falls outside the modelling process.
Once you have a binary classification for each row in your test data, you can compare those predicted classifications to the actual classifications. This can be done in the form of a confusion matrix. In classification problems, accuracy is defined as the number of correct predictions divided by the total number of predictions.
Note that accuracy is not normally a very good indicator of model performance, especially with imbalanced datasets. It's better to assess your model based on the probabilities it produces rather than the classifications that follow your decision-making process. e.g. Look into using Brier scores as an alternative.