I use the ELKI framework to cluster a series of points, defined by their geographic coordinates (longitude, latitude). The algorithm I use is DBSCAN.
Now I would like to add another (numerical) attribute that weights the importance of the points (let's say size).
In theory, the points would now be defined in a 3 dimensional space (rather than 2D) and the distance would be a mixture of geographic distance and data distance.
In practice, I tried to do this in ELKI, but I step into a concrete problem. The clustering algorithms expect a "database" as an input.
Clustering<DBSCANModel> de.lmu.ifi.dbs.elki.algorithm.AbstractAlgorithm.run(Database database)
This database is created from a LisParametrization, which amongst other things, reads a database connection:
params.addParameter(
AbstractDatabase.Parameterizer.DATABASE_CONNECTION_ID, dbc);
Finally, this database connection reads the data from a 2D array:
Import an existing data matrix (double[rows][cols]
) into an ELKI database.
DatabaseConnection dbc = new ArrayAdapterDatabaseConnection(array[][]);
My question is: is there any way of replacing this 2D array for a *D matrix?
For instance in my case, I would like to use a 3D array, to store the two geographic coordinates and the numerical attribute. Something like this:
array[][][]
If you want to put weight on the instances, you should switch to GeneralizedDBSCAN, and implement a weighted CorePredicate.
double[rows][cols]
is fine. You have three columns: longitude, latitute, weight.
DimensionSelectingLatLngDistanceFunction can work with 3D vectors, too. You just have to specify in which column latitude, and in which column longitude is stored.
Alternatively, you can build your own DatabaseConnection
. It could return two relations: one is a 2d vector field containing latitude and longitude, the second is a 1d relation containing the weights only. But working with multiple relations can be tricky. Above approach is easier to use.