I have a dataset consisting of bets for soccer matches. I am carrying out outlier detection using 3 parameters, the odds that the home team wins, the odds that the match ends in a draw, and the odds that the away team wins.
Each record looks something like this:
Home Draw Away
1.320 5.700 13.500
I have identified the clusters but am having difficulty identifying which one contains the noise, the most plausible seems to be the last cluster (i.e if I have 10 clusters, cluster 10 would be the noise.)
Is this the correct way of obtaining outliers from my dataset using DBSCAN
, is there a better way?
Also how can I know how much clusters I have to obtain the last one (the one with the noise) without manually checking?
I am completely new to statistical programming and outlier detection, I apologise if I sound utterly clueless.
Read the documentation, please.
integer vector coding cluster membership with noise observations (singletons) coded as 0
It's there, just search for the word "noise" in the manual of dbscan
.