machine-learning vectorization octave logistic-regression multiclass-classification

Vectorizing labels detection of a vector (dataset) in Octave for multi class Logistic regression

While implementing Logistic regression with multi-features and with multi classes (my chosen data set has classes 1,2,3,4 and 5) of the m (>100) sample data with classes between 1 and 5. I tried to find out the no. of unique labels/classes and also put them as a vector. I could write the below code with Y as a column vector of size (m,1)

classes = [Y(1,1)]; #Initializing classes
for i = 2:m
    count = 0;
    for j = 1:length(classes)
        if Y(i,1) == classes(j,1)
            count = count + 1;
        end;
    end
    if count ==0
        classes = [classes; Y(i,1)];
    end
end

This gave me the list of unique labels in the vector Y. However, I was wondering if there's any better way of writing this code (the above lines of code appears childish to me), especially by vectorization. Any suggestions are welcome. Thanks.

Solution

It appears that if the purpose of the code is just to generate a list of the unique values in Y, you could just use unique(Y). for example:

>> m = 10;
>> Y = floor(rand(m,1)*5+1)
Y =

   5
   1
   5
   4
   2
   2
   1
   5
   1
   4

>> unique(Y)
ans =

   1
   2
   4
   5

now, the output of your function has them in order they first appear in the list. e.g.,

classes = 

   5
   1
   4
   2

if that is important, you'll need something like this:

>> [sortedClasses idx] = unique(Y,"first")
sortedClasses =

   1
   2
   4
   5

idx =

   2
   5
   4
   1

>> unsortedClasess = Y(sort(idx))
unsortedClasess =

   5
   1
   4
   2

both unique and sort are fairly well vectorized for speed. And removing the repeated expansion of classes will prevent repeated variable copying that would impose significant overhead if you had a very large number of classes.