Search code examples
rfrequency

How to compute the frequency of the transition between the two states?


My dataset looks like this:

Data <- read.table(header=TRUE, text="
itemset
aac,
cca,
bab,
caa,
aba,
abb,
cab,
bcc,
aca,
bab,
cca,
cac,
baa,
baa,
abc,
abb,
cbb,
baa,
cba,
acb,
ccb,
bbc,
aac,
bac,
abb,
bba,
bca,
acc,
caa,
cca")

Let's say that each line corresponds one state. I need to compute the frequency of the transition between the two neighboring states.

Question. Is exist the standard functions?

I have found the partical answer here

cbind(table(Data), table(Data) / nrow(Data))

Tab <- table(Data)                        # observed freq.
Tab <- cbind(Tab, Tab/nrow(Data))             # combine freq. and prop.
Tab <- Tab[order(Tab[,2], decreasing=TRUE),]  # sort
colnames(Tab) <- c("freq", "prop")        # add column names

The dim(Tab)[1] is 22, and the result should be the 22x22 matrix.


Solution

  • Yet another way with reshape2, yielding a 21x21 probability transition matrix

    library(reshape2)
    Data <- data.frame(Data, stringsAsFactors = FALSE)
    Data$nextitem <- c(as.character(Data$itemset[-1]), NA)
    Data$value <- 1
    df <- dcast(Data, itemset~nextitem, fill=0)
    df <- df[-ncol(df)]
    df[-1] <- df[-1] / rowSums(df[-1]) # assuming no rows have all zeros
    df
    #   itemset aac aba abb       abc aca acb acc       baa bab bac       bba bbc bca bcc caa       cab cac       cba       cbb cca ccb
    #1      aac   0 0.0   0 0.0000000   0   0   0 0.0000000 0.0 0.5 0.0000000   0   0   0 0.0 0.0000000 0.0 0.0000000 0.0000000 0.5   0
    #2      aba   0 0.0   1 0.0000000   0   0   0 0.0000000 0.0 0.0 0.0000000   0   0   0 0.0 0.0000000 0.0 0.0000000 0.0000000 0.0   0
    #3      abb   0 0.0   0 0.0000000   0   0   0 0.0000000 0.0 0.0 0.3333333   0   0   0 0.0 0.3333333 0.0 0.0000000 0.3333333 0.0   0
    #4      abc   0 0.0   1 0.0000000   0   0   0 0.0000000 0.0 0.0 0.0000000   0   0   0 0.0 0.0000000 0.0 0.0000000 0.0000000 0.0   0
    #5      aca   0 0.0   0 0.0000000   0   0   0 0.0000000 1.0 0.0 0.0000000   0   0   0 0.0 0.0000000 0.0 0.0000000 0.0000000 0.0   0
    #6      acb   0 0.0   0 0.0000000   0   0   0 0.0000000 0.0 0.0 0.0000000   0   0   0 0.0 0.0000000 0.0 0.0000000 0.0000000 0.0   1
    #7      acc   0 0.0   0 0.0000000   0   0   0 0.0000000 0.0 0.0 0.0000000   0   0   0 1.0 0.0000000 0.0 0.0000000 0.0000000 0.0   0
    #8      baa   0 0.0   0 0.3333333   0   0   0 0.3333333 0.0 0.0 0.0000000   0   0   0 0.0 0.0000000 0.0 0.3333333 0.0000000 0.0   0
    #9      bab   0 0.0   0 0.0000000   0   0   0 0.0000000 0.0 0.0 0.0000000   0   0   0 0.5 0.0000000 0.0 0.0000000 0.0000000 0.5   0
    #10     bac   0 0.0   1 0.0000000   0   0   0 0.0000000 0.0 0.0 0.0000000   0   0   0 0.0 0.0000000 0.0 0.0000000 0.0000000 0.0   0
    #11     bba   0 0.0   0 0.0000000   0   0   0 0.0000000 0.0 0.0 0.0000000   0   1   0 0.0 0.0000000 0.0 0.0000000 0.0000000 0.0   0
    #12     bbc   1 0.0   0 0.0000000   0   0   0 0.0000000 0.0 0.0 0.0000000   0   0   0 0.0 0.0000000 0.0 0.0000000 0.0000000 0.0   0
    #13     bca   0 0.0   0 0.0000000   0   0   1 0.0000000 0.0 0.0 0.0000000   0   0   0 0.0 0.0000000 0.0 0.0000000 0.0000000 0.0   0
    #14     bcc   0 0.0   0 0.0000000   1   0   0 0.0000000 0.0 0.0 0.0000000   0   0   0 0.0 0.0000000 0.0 0.0000000 0.0000000 0.0   0
    #15     caa   0 0.5   0 0.0000000   0   0   0 0.0000000 0.0 0.0 0.0000000   0   0   0 0.0 0.0000000 0.0 0.0000000 0.0000000 0.5   0
    #16     cab   0 0.0   0 0.0000000   0   0   0 0.0000000 0.0 0.0 0.0000000   0   0   1 0.0 0.0000000 0.0 0.0000000 0.0000000 0.0   0
    #17     cac   0 0.0   0 0.0000000   0   0   0 1.0000000 0.0 0.0 0.0000000   0   0   0 0.0 0.0000000 0.0 0.0000000 0.0000000 0.0   0
    #18     cba   0 0.0   0 0.0000000   0   1   0 0.0000000 0.0 0.0 0.0000000   0   0   0 0.0 0.0000000 0.0 0.0000000 0.0000000 0.0   0
    #19     cbb   0 0.0   0 0.0000000   0   0   0 1.0000000 0.0 0.0 0.0000000   0   0   0 0.0 0.0000000 0.0 0.0000000 0.0000000 0.0   0
    #20     cca   0 0.0   0 0.0000000   0   0   0 0.0000000 0.5 0.0 0.0000000   0   0   0 0.0 0.0000000 0.5 0.0000000 0.0000000 0.0   0
    #21     ccb   0 0.0   0 0.0000000   0   0   0 0.0000000 0.0 0.0 0.0000000   1   0   0 0.0 0.0000000 0.0 0.0000000 0.0000000 0.0   0