r algorithm machine-learning cluster-analysis hierarchical-clustering

Clustering a set of countries based on cultural similarity on R

I am having some problems trying to cluster countries using a sort of cultural correlation that I already have.

basically, the dataset looks like this: with 90 countries, 91 columns (90 country columns + one to identify the nations on the rows) and 90 rows

 Nation Ita   Fra   Ger   Esp   Eng  ...
 Ita    NA    0.2   0.1   0.6   0.4  ...
 Fra    0.2   NA    0.2   0.1   0.3  ...
 Ger    0.7   0.1   NA    0.5   0.4
 Esp    0.6   0.1   0.5   NA    0.2
 Eng    0.4   0.3   0.4   0.2   NA
 ...                              .....
 ...

I am looking for an algorithm that clusters my countries in groups (for instance groups of 3, or even better, more flexible clusters, such that the number of clusters and the number of countries per cluster is not fixed ex-ante

so that the output is for instance

  Nation   cluster
  Ita       1
  Fra       2
  Ger       3
  Esp       1
  Eng       3
  ......

Solution

#DATA
df1 = read.table(strip.white = TRUE, stringsAsFactors = FALSE, header = TRUE, text =
"Nation Ita   Fra   Ger   Esp   Eng
 Ita    NA    0.2   0.1   0.6   0.4
 Fra    0.2   NA    0.2   0.1   0.3
 Ger    0.7   0.1   NA    0.5   0.4
 Esp    0.6   0.1   0.5   NA    0.2
 Eng    0.4   0.3   0.4   0.2   NA")

df1 = replace(df1, is.na(df1), 0)
row.names(df1) = df1[,1]
df1 = df1[,-1]

# Run PCA to visualize similarities
pca = prcomp(as.matrix(df1))    
pca_m = as.data.frame(pca$x)
plot(pca_m$PC1, pca_m$PC2)
text(x = pca_m$PC1, pca_m$PC2, labels = row.names(df1))

# Run k-means and choose centers based on pca plot
kk = kmeans(x = df1, centers = 3)
kk$cluster
# Ita Fra Ger Esp Eng 
#   3   1   2   1   1