I just start using R and I have a question regarding cluster analysis in R. I apply agnes function to apply cluster analysis for my dataset. But I realized that cluster results and the pltrees are different when I used the .txt file and .csv file.
Maybe it would be better to explain my problem with the images:
My dataset in .txt format;
I used the following code to see the data in R;
data01 <- read.table("D:/CLUSTER_ANALYSIS/NumericData3_IN.txt", header = T)
and everything is fine, it seems like;
I apply the cluster anaylsis,
complete1 <- agnes(data01, stand = FALSE, method = 'complete')
plot(complete1, which.plots=2, main='Complete-Linkage')
And here is the pltree:
I made the same steps with .csv file, which includes exactly the same dataset. Here is the dataset in .csv format:
Again the cluster analysis for .csv file:
data02 <- read.csv("D:/CLUSTER_ANALYSIS/NumericData3.csv", header = T)
complete2 <- agnes(data02, stand = FALSE, method = 'complete')
plot(complete2, which.plots=2, main='Complete-Linkage')
And the pltree is completely different,
So, DECIMAL SEPARATOR for the txt is COMMA and for csv file it is DOT. Which of these results are correct? Is the decimal separator for numeric dataset comma or dot in R?
From the R manual on read.table (and read.csv) you can see the default separators. They are dot for each of your used functions. You can also set them to whatever you like with the "dec" parameter. Eg:
data01 <- read.table("D:/CLUSTER_ANALYSIS/NumericData3_IN.txt", header = T, dec=",")