Search code examples
pythoncluster-analysisdata-science

How to visualize k-means of multiple columns


i'm not a datascientist however i am intriuged with datascience, machine learning etc etc..

in my efforts to understand all of this i am continiously making a dataset (daily scraping) of grand exchange prices of one of my favourite games Old School runescape.

one of my goals is to pick a set of stocks/items that would give me the most profit. currently i am trying out clustering with k-means, to find stocks that are similar to eachother based on some basic features that i could think of.

however i have no clue if what i'm doing is correct, for example:

( y = kmeans.fit_predict(df_items) my item_id is included with this, so is it actualy considering item_id as a feature now?)

and how do i even visualise the outcome of this i mean what goes on the x axis and what goes on the y axis, i have multiple columns...

https://github.com/extreme4all/OSRS_DataSet/blob/master/NoteBooks/Stock%20Picking.ipynb


Solution

  • To visualize something you have to reduce dimensionality to 2-3 dimensions, plus you can use color as 4-th dimension or in your case to indicate cluster number.

    tSNE is a common choice for this task, check sklearn docs for details: https://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html