Search code examples
rcsvdecimalrstudiocluster-analysis

Different results for same dataset in Cluster Analyses with R Studio?


I just start using R and I have a question regarding cluster analysis in R. I apply agnes function to apply cluster analysis for my dataset. But I realized that cluster results and the pltrees are different when I used the .txt file and .csv file.

Maybe it would be better to explain my problem with the images:

My dataset in .txt format; enter image description here

I used the following code to see the data in R;

data01 <- read.table("D:/CLUSTER_ANALYSIS/NumericData3_IN.txt", header = T)

and everything is fine, it seems like; enter image description here

I apply the cluster anaylsis,

complete1 <- agnes(data01, stand = FALSE, method = 'complete')
plot(complete1, which.plots=2, main='Complete-Linkage')

And here is the pltree: enter image description here

I made the same steps with .csv file, which includes exactly the same dataset. Here is the dataset in .csv format: enter image description here

Again the cluster analysis for .csv file:

data02 <- read.csv("D:/CLUSTER_ANALYSIS/NumericData3.csv", header = T)

complete2 <- agnes(data02, stand = FALSE, method = 'complete')

plot(complete2, which.plots=2, main='Complete-Linkage')

And the pltree is completely different, enter image description here

So, DECIMAL SEPARATOR for the txt is COMMA and for csv file it is DOT. Which of these results are correct? Is the decimal separator for numeric dataset comma or dot in R?


Solution

  • From the R manual on read.table (and read.csv) you can see the default separators. They are dot for each of your used functions. You can also set them to whatever you like with the "dec" parameter. Eg:

    data01 <- read.table("D:/CLUSTER_ANALYSIS/NumericData3_IN.txt", header = T, dec=",")