Search code examples
rcluster-computingnoiseoutliersdbscan

Obtaining noise in DBSCAN using R


I have a dataset consisting of bets for soccer matches. I am carrying out outlier detection using 3 parameters, the odds that the home team wins, the odds that the match ends in a draw, and the odds that the away team wins.

Each record looks something like this:

 Home   Draw    Away
1.320  5.700  13.500

I have identified the clusters but am having difficulty identifying which one contains the noise, the most plausible seems to be the last cluster (i.e if I have 10 clusters, cluster 10 would be the noise.)

Is this the correct way of obtaining outliers from my dataset using DBSCAN, is there a better way?

Also how can I know how much clusters I have to obtain the last one (the one with the noise) without manually checking?

I am completely new to statistical programming and outlier detection, I apologise if I sound utterly clueless.


Solution

  • Read the documentation, please.

    integer vector coding cluster membership with noise observations (singletons) coded as 0

    It's there, just search for the word "noise" in the manual of dbscan.