This is the code I used:
resources <- read.csv("https://raw.githubusercontent.com/umbertomig/intro-prob-stat-FGV/master/datasets/resources.csv")
res <- subset(resources, select = c("cty_name", "year", "regime",
"oil", "logGDPcp", "illit"))
resNoNA <- na.omit(res)
resNoNAS <- scale(resNoNA[, 3:6])
colMeans(resNoNA[, 3:6])
apply(resNoNA[, 3:6], 2, sd)
cluster2 <- kmeans(resNoNAS, centers = 2)
table(cluster2$cluster)
## this gives standardized answer, which is hard to interpret
cluster2$centers
## better to subset the original data and then compute means
g1 <- resNoNA[cluster2$cluster == 1, ]
colMeans(g1[, 3:6])
g2 <- resNoNA[cluster2$cluster == 2, ]
colMeans(g2[, 3:6])
plot(x = resNoNA$logGDPcp, y = resNoNA$illit, main = "Illiteracy v GDP",
xlab = "GDP per Capita", ylab = "Illiteracy",
col = cluster2$cluster, cex = resNoNA$oil)
but I wanted to make the circles smaller in order to fit within the limits of the graph
You control the circle diameter with cex=
here.
plot(x = resNoNA$logGDPcp, y = resNoNA$illit, main = "Illiteracy v GDP",
xlab = "GDP per Capita", ylab = "Illiteracy",
col = cluster2$cluster, cex = resNoNA$oil)
plot(x = resNoNA$logGDPcp, y = resNoNA$illit, main = "Illiteracy v GDP",
xlab = "GDP per Capita", ylab = "Illiteracy",
col = cluster2$cluster, cex = resNoNA$oil/3)
plot(x = resNoNA$logGDPcp, y = resNoNA$illit, main = "Illiteracy v GDP",
xlab = "GDP per Capita", ylab = "Illiteracy",
col = cluster2$cluster, cex = resNoNA$oil/5)
Realize, however, that if you are using this in some automated report generator (e.g., rmarkdown
, shiny
), then you may need to adjust the dimensions of the plot to control it from the other angle: update xlim
and ylim
.