Search code examples
rstatisticsprobabilitydata-transform

Joint probability of pairs in R


I have a matrix that has consecutive pairs of values from a sequence.

For example, in a sequence like [1,1,3,3,3,4,4,2,4,2,2], I would have the following pairs stored in a matrix.

1, 1
1, 3
3, 3
3, 3
3, 4
4, 4
4, 2
2, 4
4, 2
2, 2

And, I want to get the probability of occurrence for each unique pair.

For example, for a pair like (a,b), the joint_prob(a,b) = cond_prob(b|a)/prob(a)

(1,1) 0.5
(1,3) 0.5
(3,3) 0.6
and so on..

Is there anyway I can do this in R without having to use many loops? By using built in libraries? Could someone help me do this in an efficient way?


Solution

  • How about this?

    d <- c(1,1,3,3,3,4,4,2,4,2,2)
    tr <- NULL
    for (i in 1:(length(d)-1)) {  # all bigrams
      tr <- rbind(tr, data.frame(x=d[i], y=d[i+1]))
    }
    tbl <- table(tr)
    joint_prob <- tbl / rowSums(tbl) # joint probability table
    joint_prob[1,1]
    # 0.5
    joint_prob[1,3]
    # 0.5
    joint_prob[3,3]
    # 0.6666667