My dataset looks like this:
Data <- read.table(header=TRUE, text="
itemset
aac,
cca,
bab,
caa,
aba,
abb,
cab,
bcc,
aca,
bab,
cca,
cac,
baa,
baa,
abc,
abb,
cbb,
baa,
cba,
acb,
ccb,
bbc,
aac,
bac,
abb,
bba,
bca,
acc,
caa,
cca")
Let's say that each line corresponds one state. I need to compute the frequency of the transition between the two neighboring states.
Question. Is exist the standard functions?
I have found the partical answer here
cbind(table(Data), table(Data) / nrow(Data))
Tab <- table(Data) # observed freq.
Tab <- cbind(Tab, Tab/nrow(Data)) # combine freq. and prop.
Tab <- Tab[order(Tab[,2], decreasing=TRUE),] # sort
colnames(Tab) <- c("freq", "prop") # add column names
The dim(Tab)[1]
is 22, and the result should be the 22x22
matrix.
Yet another way with reshape2
, yielding a 21x21 probability transition matrix
library(reshape2)
Data <- data.frame(Data, stringsAsFactors = FALSE)
Data$nextitem <- c(as.character(Data$itemset[-1]), NA)
Data$value <- 1
df <- dcast(Data, itemset~nextitem, fill=0)
df <- df[-ncol(df)]
df[-1] <- df[-1] / rowSums(df[-1]) # assuming no rows have all zeros
df
# itemset aac aba abb abc aca acb acc baa bab bac bba bbc bca bcc caa cab cac cba cbb cca ccb
#1 aac 0 0.0 0 0.0000000 0 0 0 0.0000000 0.0 0.5 0.0000000 0 0 0 0.0 0.0000000 0.0 0.0000000 0.0000000 0.5 0
#2 aba 0 0.0 1 0.0000000 0 0 0 0.0000000 0.0 0.0 0.0000000 0 0 0 0.0 0.0000000 0.0 0.0000000 0.0000000 0.0 0
#3 abb 0 0.0 0 0.0000000 0 0 0 0.0000000 0.0 0.0 0.3333333 0 0 0 0.0 0.3333333 0.0 0.0000000 0.3333333 0.0 0
#4 abc 0 0.0 1 0.0000000 0 0 0 0.0000000 0.0 0.0 0.0000000 0 0 0 0.0 0.0000000 0.0 0.0000000 0.0000000 0.0 0
#5 aca 0 0.0 0 0.0000000 0 0 0 0.0000000 1.0 0.0 0.0000000 0 0 0 0.0 0.0000000 0.0 0.0000000 0.0000000 0.0 0
#6 acb 0 0.0 0 0.0000000 0 0 0 0.0000000 0.0 0.0 0.0000000 0 0 0 0.0 0.0000000 0.0 0.0000000 0.0000000 0.0 1
#7 acc 0 0.0 0 0.0000000 0 0 0 0.0000000 0.0 0.0 0.0000000 0 0 0 1.0 0.0000000 0.0 0.0000000 0.0000000 0.0 0
#8 baa 0 0.0 0 0.3333333 0 0 0 0.3333333 0.0 0.0 0.0000000 0 0 0 0.0 0.0000000 0.0 0.3333333 0.0000000 0.0 0
#9 bab 0 0.0 0 0.0000000 0 0 0 0.0000000 0.0 0.0 0.0000000 0 0 0 0.5 0.0000000 0.0 0.0000000 0.0000000 0.5 0
#10 bac 0 0.0 1 0.0000000 0 0 0 0.0000000 0.0 0.0 0.0000000 0 0 0 0.0 0.0000000 0.0 0.0000000 0.0000000 0.0 0
#11 bba 0 0.0 0 0.0000000 0 0 0 0.0000000 0.0 0.0 0.0000000 0 1 0 0.0 0.0000000 0.0 0.0000000 0.0000000 0.0 0
#12 bbc 1 0.0 0 0.0000000 0 0 0 0.0000000 0.0 0.0 0.0000000 0 0 0 0.0 0.0000000 0.0 0.0000000 0.0000000 0.0 0
#13 bca 0 0.0 0 0.0000000 0 0 1 0.0000000 0.0 0.0 0.0000000 0 0 0 0.0 0.0000000 0.0 0.0000000 0.0000000 0.0 0
#14 bcc 0 0.0 0 0.0000000 1 0 0 0.0000000 0.0 0.0 0.0000000 0 0 0 0.0 0.0000000 0.0 0.0000000 0.0000000 0.0 0
#15 caa 0 0.5 0 0.0000000 0 0 0 0.0000000 0.0 0.0 0.0000000 0 0 0 0.0 0.0000000 0.0 0.0000000 0.0000000 0.5 0
#16 cab 0 0.0 0 0.0000000 0 0 0 0.0000000 0.0 0.0 0.0000000 0 0 1 0.0 0.0000000 0.0 0.0000000 0.0000000 0.0 0
#17 cac 0 0.0 0 0.0000000 0 0 0 1.0000000 0.0 0.0 0.0000000 0 0 0 0.0 0.0000000 0.0 0.0000000 0.0000000 0.0 0
#18 cba 0 0.0 0 0.0000000 0 1 0 0.0000000 0.0 0.0 0.0000000 0 0 0 0.0 0.0000000 0.0 0.0000000 0.0000000 0.0 0
#19 cbb 0 0.0 0 0.0000000 0 0 0 1.0000000 0.0 0.0 0.0000000 0 0 0 0.0 0.0000000 0.0 0.0000000 0.0000000 0.0 0
#20 cca 0 0.0 0 0.0000000 0 0 0 0.0000000 0.5 0.0 0.0000000 0 0 0 0.0 0.0000000 0.5 0.0000000 0.0000000 0.0 0
#21 ccb 0 0.0 0 0.0000000 0 0 0 0.0000000 0.0 0.0 0.0000000 1 0 0 0.0 0.0000000 0.0 0.0000000 0.0000000 0.0 0