I have a large number of pair of X and Y variables along with their cluster membership column. Cluster membership (group) may not be always right (limitation in perfection of clustering algorithm), I want to interactively visualize the clusters and manipulate the cluster memberships to identified points.
I tried rggobi and the following is the point I was able to get to (I do not mean that I need to use rggobi / ggobi, if better options are available you are welcome to suggest).
# data
set.seed (1234)
c1 <- rnorm (40, 0.1, 0.02); c2 <- rnorm (40, 0.3, 0.01)
c3 <- rnorm (40, 0.5, 0.01); c4 <- rnorm (40, 0.7, 0.01)
c5 <- rnorm (40, 0.9, 0.03)
Yv <- 0.3 + rnorm (200, 0.05, 0.05)
myd <- data.frame (Xv = round (c(c1, c2, c3, c4, c5), 2), Yv = round (Yv, 2),
cltr = factor (rep(1:5, each = 40)))
require(rggobi)
g <- ggobi(myd)
display(g[1], vars=list(X="Xv", Y="Yv"))
You can see five clusters, colored differently with cltr variable. I manually identified the points that are outliers and I want to make their value to NA in the cltr variable. Is their any easy way to disassociate such membership and write to file.
You could try identify
to get the indices of the points manually:
## use base::plot
plot(myd$Xv, myd$Yv, col=myd$cltr)
exclude <- identify(myd$Xv, myd$Yv) ## left click on the points you want to exclude (right click to stop/finish)
myd$cltr[exclude] <- NA