I am trying to follow along in this post here .
I am stuck trying to get the probability that a 1 will follow a 1 or a 2 follow a 1 etc. from the cluster given by kmeans.
model<-kmeans(euFlat[,2:4], centers=8,iter.max=100,nstart=20)
euCluster<-euOrig
euCluster$Cluster<-model$cluster
The chart from the post below is what they used in the post. However, I have no clue on how to generate this easily in R.
My data currently looks like:
Open High Low Close Volume Cluster
2008-06-25 18:00:00 1.5570 1.5587 1.5570 1.5585 191 8
2008-06-25 22:00:00 1.5584 1.5686 1.5539 1.5664 2141 7
2008-06-26 02:00:00 1.5663 1.5677 1.5661 1.5663 321 8
2008-06-26 06:00:00 1.5744 1.5749 1.5741 1.5747 131 8
2008-06-26 10:00:00 1.5748 1.5764 1.5723 1.5758 721 8
2008-06-26 14:00:00 1.5757 1.5767 1.5746 1.5750 351 8
With the cluster on the end.
Is there an easy way to do this in R without having to write a custom function?
this is the way
> table(s$cluster , lag(s$cluster))
1 2 3 4 5 6
1 43 15 14 5 7 42
2 17 4 10 1 5 11
3 17 9 16 2 3 16
4 8 1 1 0 2 0
5 4 8 4 1 3 5
6 38 11 18 3 5 25
> prop.table(table(s$cluster , lag(s$cluster))) * 100
1 2 3 4 5 6
1 11.4973262 4.0106952 3.7433155 1.3368984 1.8716578 11.2299465
2 4.5454545 1.0695187 2.6737968 0.2673797 1.3368984 2.9411765
3 4.5454545 2.4064171 4.2780749 0.5347594 0.8021390 4.2780749
4 2.1390374 0.2673797 0.2673797 0.0000000 0.5347594 0.0000000
5 1.0695187 2.1390374 1.0695187 0.2673797 0.8021390 1.3368984
6 10.1604278 2.9411765 4.8128342 0.8021390 1.3368984 6.6844920
or to get the prop.table by column use
apply(table(s$cluster , lag(s$cluster)) , 2 , prop.table) * 100
1 2 3 4 5 6
1 16.66667 12 6.557377 4.545455 6.306306 2.325581
2 0.00000 12 6.557377 7.272727 3.603604 13.953488
3 25.00000 12 24.590164 15.454545 11.711712 16.279070
4 25.00000 24 27.868852 28.181818 33.333333 32.558140
5 12.50000 16 18.032787 38.181818 34.234234 27.906977
6 20.83333 24 16.393443 6.363636 10.810811 6.976744