While implementing Logistic regression with multi-features and with multi classes (my chosen data set has classes 1,2,3,4 and 5) of the m (>100) sample data with classes between 1 and 5. I tried to find out the no. of unique labels/classes and also put them as a vector. I could write the below code with Y as a column vector of size (m,1)
classes = [Y(1,1)]; #Initializing classes
for i = 2:m
count = 0;
for j = 1:length(classes)
if Y(i,1) == classes(j,1)
count = count + 1;
end;
end
if count ==0
classes = [classes; Y(i,1)];
end
end
This gave me the list of unique labels in the vector Y. However, I was wondering if there's any better way of writing this code (the above lines of code appears childish to me), especially by vectorization. Any suggestions are welcome. Thanks.
It appears that if the purpose of the code is just to generate a list of the unique values in Y, you could just use unique(Y)
. for example:
>> m = 10;
>> Y = floor(rand(m,1)*5+1)
Y =
5
1
5
4
2
2
1
5
1
4
>> unique(Y)
ans =
1
2
4
5
now, the output of your function has them in order they first appear in the list. e.g.,
classes =
5
1
4
2
if that is important, you'll need something like this:
>> [sortedClasses idx] = unique(Y,"first")
sortedClasses =
1
2
4
5
idx =
2
5
4
1
>> unsortedClasess = Y(sort(idx))
unsortedClasess =
5
1
4
2
both unique
and sort
are fairly well vectorized for speed. And removing the repeated expansion of classes
will prevent repeated variable copying that would impose significant overhead if you had a very large number of classes.