Search code examples
artificial-intelligencecluster-analysisk-meansunsupervised-learningdimensionality-reduction

Unsupervised learning reduce dimensionality/clustering


I am trying to understand how can I split my data into clusters using unsupervised learning. For example, k-means method.

I have 20 columns of data and how can it be projected on 2D surface without losing of necessary information from 18 columns?

What should I use to do that?

Any help will be appreciated.


Solution

  • If you are simply interested in viewing your data in 2 dimensions, consider using t-SNE. The scikit-learn python package has a great implementation you can use. However, just remember that you shouldn't cluster your data on the t-SNE output, as the space your data resides in gets sufficiently distorted in the process (only short distances are maintained, whereas longer distances are heavily altered to be either shorter or longer)