I'm a complete beginner for KMeans. How do you understand what X value to take? I have a dataframe with several rows and columns. I don't know how I can take one specific X value.
I cant substitute the entire dataframe. eg:
df = pd.read_csv("cereal.csv")
kmeans = KMeans(n_clusters=4)
kmeans.fit(X) ## How do I get this X?
X
is basically all the values from your dataframe which in this case is df
.
For example:
from sklearn.cluster import KMeans
X = df.values.astype(np.float)
kmeans = KMeans(n_clusters = 4).fit(X)
To see the labels assigned, you can now do:
predicted_values = kmeans.labels_
You may have to perform data cleaning and remove features prior to passing it to the KMeans algorithm. In other words, some columns can be removed for example, ID if you have one.
If any of your columns have string
values, they need to be encoded into a numerical
format. For example, you cannot pass values like high
or low
, you need to encode them into 0
or 1
.