Search code examples
outliersunsupervised-learningelki

Evaluation in Elki


I know ELKI currently only includes unsupervised outlier detection methods, therefore Elki doesn't divide input data in traing set and test set. But, i've seen evaluation is over minority class when avaiable. i would like to know:

  1. Does elki use all input data to evaluation?
  2. Does runtime take account evaluation or just training time?
  3. Does evaluation take account outliers scores to estimate false positive rate and true positive rate in order to evaluate rankings?
  4. In LOF algorithm, for example, suppose a instance in normal class has a high LOF score. will it be consider a false positive or true positive in evaluation?

Thanks!


Solution

    1. Yes, all input is used for unsupervised methods.

      The labels must not have been used for running the algorithm, they are only used at evaluation time.

    2. Runtime reported is separately for every algorithm.

    3. This depends on your evaluation. Most measures (e.g. ROC AUC) will only take the ranking into account. To evaluate the actual scores, you first need to normalize them. For a measure that takes (normalized) scores into account, please see

      E. Schubert, R. Wojdanowski, A. Zimek, H.-P. Kriegel
      On Evaluation of Outlier Rankings and Outlier Scores
      In Proceedings of the 12th SIAM International Conference on Data Mining (SDM), Anaheim, CA: 1047–1058, 2012.

    4. True positive and false positives require a binary decision. See ROC AUC for an approach that does not require to specify a threshold to make the decision binary, but evaluate all possible thresholds.