I am working on a project currently and I wish to cluster multi-dimensional data. I tried K-Means clustering and DBSCAN clustering, both being completely different algorithms.
The K-Means model returned a fairly good output, it returned 5 clusters but I have read that when the dimensionality is large, the Euclidean distance fails so I don't know if I can trust this model.
On trying the DBSCAN model, the model generated a lot of noise points and clustered a lot of points in one cluster. I tried the KNN dist plot method to find the optimal eps for the model but I can't seem to make the model work. This led to my conclusion that maybe the density of the points plotted is very high and maybe that is the reason I am getting a lot of points in one cluster.
For clustering, I am using 10 different columns of data. Should I change the algorithm I am using? What would be a better algorithm for multi-dimensional data and with less-varying density?
You can first make a dimension reduction on your dataset with PCA/LDA/t-sne or autoencoders. Then run standart some clustering algorithms.
Another way is you can use fancy deep clustering methods. This blog post is really nice explanation of how they apply deep clustering on the high dimensional dataset.