Search code examples
javadbscanelki

ELKI DBSCAN : How to set dbc.parser?


I am doing DBSCAN clustering and I have one more column apart from latitude longitude which I want to see with cluster results. For example data looks like this:

28.6029445  77.3443552  1
28.6029511  77.3443573  2
28.6029436  77.3443458  3
28.6029011  77.3443032  4
28.6028967  77.3443042  5
28.6029087  77.3442829  6
28.6029132  77.3442797  7

Now in minigui if i set parser.labelindices to 2 and run the task then the output looks like this:

# Cluster: Cluster 0
ID=63222 28.6031295 77.3407848 441
ID=63225 28.603134 77.3407744 444
ID=63220 28.6031566667 77.3407816667 439
ID=63226 28.6030819 77.3407605 445
ID=63221 28.6032 77.3407616667 440
ID=63228 28.603085 77.34071 447
ID=63215 28.60318 77.3408583333 434
ID=63229 28.6030751 77.3407096 448

So it is still connected to the 3rd column which I passed as a label. I have checked the clustering result by passing just latitude and longitude and its exactly same. So in a way by passing a column as 'label' I can retrieve that column with lat long in cluster results.

Now I want to use this in my java code

// Setup parameters:
            ListParameterization params = new ListParameterization();
            params.addParameter(
                    FileBasedDatabaseConnection.Parameterizer.INPUT_ID,
                    fileLocation);
            params.addParameter(
             NumberVectorLabelParser.Parameterizer.LABEL_INDICES_ID,
             2);
            params.addParameter(AbstractDatabase.Parameterizer.INDEX_ID,
                    RStarTreeFactory.class);

But this is giving a NullPointerException. In MiniGui dbc.parser is NumberVectorLabelParser by default. So this should work fine. What am I missing?


Solution

  • I will have a look into the NPE, it should return a more helpful error message instead.

    Most likely, the problem is that this parameter is of type List<Integer>, i.e. you would need to pass a list. Alternatively, you can pass a String, which will be parsed. The following should work just fine:

    params.addParameter(
             NumberVectorLabelParser.Parameterizer.LABEL_INDICES_ID,
             "2");
    

    Note that the text writer might (I have not checked this) print labels as is. So you cannot take the output as indication that it considered your data set to be 3 dimensional.

    The debugging handler -resulthandler LogResultStructureResultHandler -verbose should give you type output:

    java -jar elki.jar KDDCLIApplication -dbc.in dbpedia.gz \
    -algorithm NullAlgorithm \
    -resulthandler LogResultStructureResultHandler -verbose
    

    should yield an output like this:

    de.lmu.ifi.dbs.elki.datasource.FileBasedDatabaseConnection.load: 1941 ms
    de.lmu.ifi.dbs.elki.algorithm.NullAlgorithm.runtime: 0 ms
    BasicResult: Algorithm Step (main)
     StaticArrayDatabase: Database (database)
      DBIDView: Database IDs (DBID)
      MaterializedRelation: DoubleVector,dim=2 (relation)
      MaterializedRelation: LabelList (relation)
     SettingsResult: Settings (settings)
    

    In this case, my data set are coordinates from Wikipedia, along with a name each. I have a 2 dimensional DoubleVector relation, and a LabelList relation storing the object names.