Search code examples

Importing data for k-means clustering

I'm trying to follow this

library(tidyverse)  # data manipulation
library(cluster)    # clustering algorithms
library(factoextra) # clustering algorithms & visualization

distance <- get_dist(df)
fviz_dist(distance, gradient = list(low = "#00AFBB", mid = "white", high = "#FC4E07"))

Which, as expected, works great.

It may be something really simple: why is there no column name for what is obviously the state field?

If I try and use this methodology with a dataset like this

ipl <- read.csv("", header=TRUE, stringsAsFactors=FALSE)
ipl <- na.omit(ipl)

distanceipl <- get_dist(ipl)
fviz_dist(distanceipl, gradient = list(low = "#00AFBB", mid = "white", high = "#FC4E07"))

Instead of the player names on each axis, I get what I think are the row numbers. How do I get the player names in PLAYER on the axes?

There are two solutions here: either label the vizualisation by using ggplot2

+scale_y_discrete (labels = FIELDFORLABELLING)

or pass the player name to row names:

rownames(dataframe) <-dataframe$FIELDFORLABELLING

thanks for the answers!


  • From the docs:

    fviz_dist(): returns a ggplot2

    So you can just add labels the way you would with a normal ggplot2 object, i.e.:

    fviz_dist(distanceipl, gradient = list(low = "#00AFBB", mid = "white", high = "#FC4E07")) + scale_y_discrete(labels = ipl$PLAYER)

    enter image description here