I am using the Weka Java API and am trying to implement a custom distance class.
I created a new class "CustomDistance" extending "NormalizableDistance" and giving it the same exact body as "EuclideanDistance". My aim is to modify the "distance" functions so that nominal attributes are not treated as distance 0 (mismatch) or 1 (match) but something more sophisticated. However, by debugging the code (and pring to console each time a method is called) I found that the only method called from that class is:
protected double updateDistance(double currDist, double diff) {
System.out.println("HERE3");
double result = currDist + diff * diff;
return result;
}
So I was wondering, if not in the distance class, where is the distance between two instances calculated?
The actual distance computation (and the call to the update function that you posted) is done in NormalizableDistance#distance(Instance,Instance,double,PerformanceStats stats)
, which is the implementation of a method in the DistanceFunction
interface. (There are several distance
methods, but eventually, they all delegate to this one).
Source code (SVN): https://svn.cms.waikato.ac.nz/svn/weka/trunk/weka/src/main/java/weka/core/NormalizableDistance.java
(I'm not sure whether this already qualifies as an answer, or whether it could be considered as "link only"... so some more words:)
In order to create such a distance function, you will likely have to dig a bit deeper into the concepts of Weka and the source code. Eventually, you might have to override this method from NormalizableDistance
, where the actual comparison takes place and either 0 or 1 is returned for the nominal attributes:
protected double difference(int index, double val1, double val2) {
switch (m_Data.attribute(index).type()) {
case Attribute.NOMINAL:
if (Utils.isMissingValue(val1) || Utils.isMissingValue(val2)
|| ((int) val1 != (int) val2)) {
return 1;
} else {
return 0;
}
....
}
(but maybe there are already easier or more elegant (built-in) ways to achieve this - I'm not sooo familiar with weka)