So i have a project as a part of a final exam in which i have to create and train some models to detect malicious executables based on data mining and machine learning techniques. I have a dataset of 14998 samples grouped on two tables of 14998x543(features) and one 14998x1(classes of those samples).
I wrote some data arrangement code but when i tried to use that on the knn classiffier i got some weird errors.Hoping someone here can help as im new to matlab syntax.
Here is my code:
clear all
close all
clc
load ('C:\Users\Ζαρο-PC\Documents\MATLAB\PatRec Project\DataMist.mat');
load ('C:\Users\Ζαρο-PC\Documents\MATLAB\PatRec Project\DataMistClasses.mat');;
inds= randperm(size(Dataset,1));
training = Dataset(inds(1:10000),:);
train_classes = DatasetMistClasses(inds(1:10000),:);
testing = Dataset(inds(10001:end),:);
test_classes = DatasetMistClasses(inds(10001:end),:);
c= knnclassify(testing,training,train_classes);
cp = classperf(c,test_classes);
cp.CorrectRate
And these are the following errors...:
Error using statslib.internal.grp2idx (line 44) You cannot subscript a table using linear indexing (one subscript) or multidimensional indexing (three or more subscripts). Use a row subscript and a variable subscript.
Error in grp2idx (line 28) [varargout{1:nargout}] = statslib.internal.grp2idx(s);
Error in knnclassify (line 86) [gindex,groups] = grp2idx(group);
Error in PatternRegognitionLabProject (line 19) c= knnclassify(testing,training,train_classes)
Really hope someone solves this as i busted my brain open trying to fix it. Thanks in advance, Dimitris
CASE CLOSED
I cannot see anything wrong with your code. I have reproduced your example with random numbers as data in Matlab 2015a and it worked correctly:
Dataset = rand(14998, 543);
DatasetMistClasses = randi(2, 14998, 1);
inds = randperm(size(Dataset,1));
training = Dataset(inds(1:10000), :);
train_classes = DatasetMistClasses(inds(1:10000), :);
testing = Dataset(inds(10001:end), :);
test_classes = DatasetMistClasses(inds(10001:end), :);
c = knnclassify(testing,training, train_classes);
cp.CorrectRate
>> cp
Label: ''
Description: ''
ClassLabels: [2x1 double]
GroundTruth: [4900x1 double]
NumberOfObservations: 4900
ControlClasses: 2
TargetClasses: 1
ValidationCounter: 1
SampleDistribution: [4900x1 double]
ErrorDistribution: [4900x1 double]
SampleDistributionByClass: [2x1 double]
ErrorDistributionByClass: [2x1 double]
CountingMatrix: [3x2 double]
CorrectRate: 0.511632653061225
ErrorRate: 0.488367346938776
LastCorrectRate: 0.511632653061225
LastErrorRate: 0.488367346938775
InconclusiveRate: 0
ClassifiedRate: 1
Sensitivity: 0.517758484609313
Specificity: 0.505071851225697
PositivePredictiveValue: 0.528393072895691
NegativePredictiveValue: 0.494414563508482
PositiveLikelihood: 1.046128586324198
NegativeLikelihood: 0.954797845535033
Prevalence: 0.517142857142857
DiagnosticTable: [2x2 double]
>> cp.CorrectRate
ans =
0.511632653061225
Maybe you the type of data you are using is messing with the knn function. Review how your data looks like and see if maybe the shape or type of data is not as expected/intended.
Good luck!