Search code examples
rdata-visualizationk-means

How to keep my categories visible on the plot after Kmeans achieved, in R?


How to I keep I visualise my factor variable without any number when clustering it's done? I want a more elegant data visualisation. Not as in this picture:

enter image description here

Yet, the one I retrieved from my kmean clustering does seem to be as nice as in the - https://afit-r.github.io/kmeans_clustering. As you can see, the author manages nicely to plot the cities as factor, where the names are visible.

Here is the data I have:

data_piv <- structure(list(Comorbidities = structure(1:9, .Label = c("asthma", 
    "diabetes_type_one", "diabetes_type_two", "heart_disease", "hypertension", 
    "kidney_disease", "liver_disease", "lung_condition", "obesity"
    ), class = "factor"), chills = c(26L, 22L, 23L, 43L, 22L, 15L, 
    43L, 24L, 20L), cough = c(58L, 57L, 56L, 57L, 59L, 60L, 62L, 
    58L, 59L), diarrhoea = c(21L, 14L, 16L, 25L, 19L, 21L, 25L, 19L, 
    22L), fatigue = c(59L, 51L, 53L, 62L, 54L, 49L, 62L, 56L, 54L
    ), headache = c(44L, 30L, 34L, 44L, 39L, 33L, 48L, 43L, 42L), 
        loss_smell_taste = c(21L, 21L, 19L, 25L, 19L, 23L, 28L, 20L, 
        19L), muscle_ache = c(47L, 44L, 43L, 60L, 46L, 43L, 56L, 
        45L, 46L), nasal_congestion = c(34L, 25L, 32L, 36L, 33L, 
        33L, 46L, 38L, 34L), nausea_vomiting = c(11L, 10L, 9L, 18L, 
        7L, 12L, 28L, 13L, 9L), shortness_breath = c(61L, 36L, 32L, 
        53L, 35L, 50L, 44L, 50L, 37L), sore_throat = c(46L, 36L, 
        39L, 51L, 49L, 50L, 57L, 45L, 49L), sputum = c(47L, 34L, 
        41L, 50L, 39L, 41L, 47L, 46L, 43L), temperature = c(20L, 
        31L, 31L, 32L, 23L, 18L, 38L, 23L, 20L)), row.names = c(NA, 
    -9L), groups = structure(list(Comorbidities = structure(1:9, .Label = c("asthma", 
    "diabetes_type_one", "diabetes_type_two", "heart_disease", "hypertension", 
    "kidney_disease", "liver_disease", "lung_condition", "obesity"
    ), class = "factor"), .rows = structure(list(1L, 2L, 3L, 4L, 
        5L, 6L, 7L, 8L, 9L), ptype = integer(0), class = c("vctrs_list_of", 
    "vctrs_vctr", "list"))), row.names = c(NA, 9L), class = c("tbl_df", 
    "tbl", "data.frame"), .drop = TRUE), class = c("grouped_df", 
    "tbl_df", "tbl", "data.frame"))

Here is the k-means clustering I have applied:

data_scaled <- as.data.frame(scale(data_piv[2:14])) 
km_res <- kmeans(data_scaled, centers = 4, nstart = 25)

And tried to plot it :

fviz_cluster(km_res, data = data_piv)

And of course the picture above has been achieved with Comorbidities being transformed in an integer. Yet as I said, I dislike as it is not elegant. I want, instead of the numbers, to get the actual names of the factor variable. Can someone help?


Solution

  • This could help:

    library(factoextra)
    #Code
    data_piv <- as.data.frame(data_piv)
    data_piv$Comorbidities <- as.character(data_piv$Comorbidities)
    rownames(data_piv) <- data_piv$Comorbidities
    data_scaled <- as.data.frame(scale(data_piv[2:14])) 
    km_res <- kmeans(data_scaled, centers = 4, nstart = 25)
    fviz_cluster(km_res, data = data_scaled)
    

    enter image description here