Search code examples
rcluster-analysisk-meansquantmod

In R How to get the probability that 1 classifier will follow another?


I am trying to follow along in this post here .

I am stuck trying to get the probability that a 1 will follow a 1 or a 2 follow a 1 etc. from the cluster given by kmeans.

model<-kmeans(euFlat[,2:4], centers=8,iter.max=100,nstart=20)
euCluster<-euOrig
euCluster$Cluster<-model$cluster

The chart from the post below is what they used in the post. However, I have no clue on how to generate this easily in R.

probability

My data currently looks like:

                      Open   High    Low  Close Volume Cluster
2008-06-25 18:00:00 1.5570 1.5587 1.5570 1.5585    191       8
2008-06-25 22:00:00 1.5584 1.5686 1.5539 1.5664   2141       7
2008-06-26 02:00:00 1.5663 1.5677 1.5661 1.5663    321       8
2008-06-26 06:00:00 1.5744 1.5749 1.5741 1.5747    131       8
2008-06-26 10:00:00 1.5748 1.5764 1.5723 1.5758    721       8
2008-06-26 14:00:00 1.5757 1.5767 1.5746 1.5750    351       8

With the cluster on the end.

Is there an easy way to do this in R without having to write a custom function?


Solution

  • this is the way

    > table(s$cluster , lag(s$cluster))
    
         1  2  3  4  5  6
      1 43 15 14  5  7 42
      2 17  4 10  1  5 11
      3 17  9 16  2  3 16
      4  8  1  1  0  2  0
      5  4  8  4  1  3  5
      6 38 11 18  3  5 25
    > prop.table(table(s$cluster , lag(s$cluster))) * 100
    
                 1          2          3          4          5          6
      1 11.4973262  4.0106952  3.7433155  1.3368984  1.8716578 11.2299465
      2  4.5454545  1.0695187  2.6737968  0.2673797  1.3368984  2.9411765
      3  4.5454545  2.4064171  4.2780749  0.5347594  0.8021390  4.2780749
      4  2.1390374  0.2673797  0.2673797  0.0000000  0.5347594  0.0000000
      5  1.0695187  2.1390374  1.0695187  0.2673797  0.8021390  1.3368984
      6 10.1604278  2.9411765  4.8128342  0.8021390  1.3368984  6.6844920
    

    or to get the prop.table by column use

    apply(table(s$cluster , lag(s$cluster)) , 2 , prop.table) * 100
    
               1  2         3         4         5         6
      1 16.66667 12  6.557377  4.545455  6.306306  2.325581
      2  0.00000 12  6.557377  7.272727  3.603604 13.953488
      3 25.00000 12 24.590164 15.454545 11.711712 16.279070
      4 25.00000 24 27.868852 28.181818 33.333333 32.558140
      5 12.50000 16 18.032787 38.181818 34.234234 27.906977
      6 20.83333 24 16.393443  6.363636 10.810811  6.976744