I've been trying to implement the Neighbourhood Component Analysis (NCA) algorithm in Octave, but apparently there's something wrong with my code and I cannot figure out what it is.
Note: I am using Carl Edward Rasmussen's minimize
function for maximization of the negative f.
Note 2: The test data I am using is the Wine dataset available at the UCI Machine Learning repository.
With some external help, I've got the answer to the question. The problem was that I was assuming wrongly that vector product of the difference of datapoints i and j should be a row vector by column vector instead of the opposite: