This is a follow up from a previous question, where we commented that using euclidian distances with lat,long coordinates does not yeld correct results. I read in the documentation that ELKI enables geographic data, namely int its distance function, present in the various clustering algorithms. In the user interface of ELKI, I can see there are options to replace the default distance function (euclidian) by a better suited one. I also see that in that case, you need to provide a datum, which makes sense, since you have to tell ELKI how the data is projected. My options in the UI are to use "geo.LngLatDistanceFunction", since I am using (x,y) coordinates and to use "WGS84SpheroidEarthModel", since the data is in epsg:4326. I am trying to parametrize accordingly my algorithm in Java, but I am not sure how to do it: If I initialize my parameters like this:
ListParameterization params2 = new ListParameterization();
params2.addParameter(de.lmu.ifi.dbs.elki.algorithm.clustering.DBSCAN.Parameterizer.MINPTS_ID, minPoints);
params2.addParameter(de.lmu.ifi.dbs.elki.algorithm.clustering.DBSCAN.Parameterizer.EPSILON_ID, epsilon);
Could I set the distance function like this?
params2.addParameter(de.lmu.ifi.dbs.elki.algorithm.DistanceBasedAlgorithm.DISTANCE_FUNCTION_ID,
de.lmu.ifi.dbs.elki.distance.distancefunction.geo.LngLatDistanceFunction.class);
What about the geo.model? (I have no clue about this)
The default earth model is SphericalVincentyEarthModel
, which is supposedly a bit faster (but assumes a spherical earth, instead of a spheroid); but this should not make much of a difference unless you need precision to the meter: the maximum error should be 0.3% of the distance, according to this answer.
To set the earth model parameter, use EarthModel.MODEL_ID
as option ID. (As referenced by the Parameterizer of LngLatDistanceFunction
). When trying to find the appropriate option ID, always have a look at the Parameterizers - we are slowly moving all the option IDs into the Parameterizers.