Search code examples
matlabknnpattern-recognition

Matlab KNN CLASSIFIER ERROR-College Project


So i have a project as a part of a final exam in which i have to create and train some models to detect malicious executables based on data mining and machine learning techniques. I have a dataset of 14998 samples grouped on two tables of 14998x543(features) and one 14998x1(classes of those samples).

I wrote some data arrangement code but when i tried to use that on the knn classiffier i got some weird errors.Hoping someone here can help as im new to matlab syntax.

Here is my code:

clear all

close all

clc

load ('C:\Users\Ζαρο-PC\Documents\MATLAB\PatRec Project\DataMist.mat');   

load ('C:\Users\Ζαρο-PC\Documents\MATLAB\PatRec Project\DataMistClasses.mat');;    


inds= randperm(size(Dataset,1));

training = Dataset(inds(1:10000),:);

train_classes = DatasetMistClasses(inds(1:10000),:);

testing = Dataset(inds(10001:end),:);

test_classes = DatasetMistClasses(inds(10001:end),:);


c= knnclassify(testing,training,train_classes);


cp = classperf(c,test_classes);

cp.CorrectRate

And these are the following errors...:

Error using statslib.internal.grp2idx (line 44) You cannot subscript a table using linear indexing (one subscript) or multidimensional indexing (three or more subscripts). Use a row subscript and a variable subscript.

Error in grp2idx (line 28) [varargout{1:nargout}] = statslib.internal.grp2idx(s);

Error in knnclassify (line 86) [gindex,groups] = grp2idx(group);

Error in PatternRegognitionLabProject (line 19) c= knnclassify(testing,training,train_classes)

Really hope someone solves this as i busted my brain open trying to fix it. Thanks in advance, Dimitris

CASE CLOSED


Solution

  • I cannot see anything wrong with your code. I have reproduced your example with random numbers as data in Matlab 2015a and it worked correctly:

    Dataset = rand(14998, 543);
    DatasetMistClasses = randi(2, 14998, 1);
    
    inds = randperm(size(Dataset,1));
    training = Dataset(inds(1:10000), :);
    train_classes = DatasetMistClasses(inds(1:10000), :);
    
    testing = Dataset(inds(10001:end), :);
    test_classes = DatasetMistClasses(inds(10001:end), :);
    
    c = knnclassify(testing,training, train_classes);
    
    cp.CorrectRate
    
    >> cp
                            Label: ''
                      Description: ''
                      ClassLabels: [2x1 double]
                      GroundTruth: [4900x1 double]
             NumberOfObservations: 4900
                   ControlClasses: 2
                    TargetClasses: 1
                ValidationCounter: 1
               SampleDistribution: [4900x1 double]
                ErrorDistribution: [4900x1 double]
        SampleDistributionByClass: [2x1 double]
         ErrorDistributionByClass: [2x1 double]
                   CountingMatrix: [3x2 double]
                      CorrectRate: 0.511632653061225
                        ErrorRate: 0.488367346938776
                  LastCorrectRate: 0.511632653061225
                    LastErrorRate: 0.488367346938775
                 InconclusiveRate: 0
                   ClassifiedRate: 1
                      Sensitivity: 0.517758484609313
                      Specificity: 0.505071851225697
          PositivePredictiveValue: 0.528393072895691
          NegativePredictiveValue: 0.494414563508482
               PositiveLikelihood: 1.046128586324198
               NegativeLikelihood: 0.954797845535033
                       Prevalence: 0.517142857142857
                  DiagnosticTable: [2x2 double]
    
    >> cp.CorrectRate
    
    ans =
    
       0.511632653061225
    

    Maybe you the type of data you are using is messing with the knn function. Review how your data looks like and see if maybe the shape or type of data is not as expected/intended.

    Good luck!