I'm pretty new to R so I was following a guide for cluster analysis, and when I get to using get_dist() I keep getting the error Error in colMeans(x, na.rm = TRUE) : 'x' must be numeric
. When I remove the column with the <chr>
data, it works fine, but the thing is, I want to keep these labels in, like the "state" labels in the USArrests dataset.
I found a question that was pretty similar to mine over here, however there were no comments or answers that were helpful for me. I've seen a few posts, such as this one that mention trying get_dist(x$x)
or as.numeric(as.character(x$x))
, but I must admit that this work around doesn't make much sense, nor have I had much success implementing these suggestions.
I can't show my full data set, but I can provide the results of head()
, and I have noticed that it differs from head(USArrests)
:
library(readxl)
Mother_2_ABS_Summer_2019_clean <- read_excel("~/.../Mother_2_ABS_Summer_2019_clean.xls",
range = "D1:H61")
head(Mother_2_ABS_Summer_2019_clean)
...1 Audience Genre Structure Proofreading
<chr> <dbl> <dbl> <dbl> <dbl>
ABS-P_29_S31 2 2 2.0 3
ABS_40_S50 3 3 3.5 3
ABS_57_S47 2 2 2.0 3
ABS_86_S48 4 3 3.0 4
ABS_143_S42 2 2 2.0 3
ABS-P_152_S49 2 1 1.0 4
head(USArrests)
Murder Assault UrbanPop Rape
<dbl> <int> <int> <dbl>
Alabama 13.2 236 58 21.2
Alaska 10.0 263 48 44.5
Arizona 8.1 294 80 31.0
Arkansas 8.8 190 50 19.5
California 9.0 276 91 40.6
Colorado 7.9 204 78 38.7
So what I've noticed is that in USArrests
, the state labels aren't categorized as <chr>
unlike my identifications for the documents.
When I follow the guide, I have no problems up until get_dist()
:
dat1 <- na.omit(Mother_2_ABS_Summer_2019_clean)
dat1 <- scale(dat1)
distance <- get_dist(dat1)
fviz_dist(distance, gradient = list(low = "#00AFBB", mid = "white", high = "#FC4E07"))
Error in colMeans(x, na.rm = TRUE) : 'x' must be numeric
When I import only the the 4 columns that contain numeric data, and go through the guide, everything works just fine and I can view the cluster results. The problem here is I want to see the visualizations WITH the document identifications, otherwise the results don't mean to much when looking at them.
If any of you have any advice or suggestions, it would be greatly appreciated.
UNTESTED: You could assign those labels as the row names:
library(tidyverse)
Mother_2_ABS_Summer_2019_clean %>% remove_rownames %>% column_to_rownames(var="...1")
Maybe consider changing the first column name so the above is cleaner and more likely to work. Then it's the same format as the USArrests.