Search code examples
rdplyrtidyrentropytraminer

calculating entropy in a sequence of letters


I am trying to calculate shannon's entropy of sequence of letters, for example,

A <- c('A-A-A-A', 'A-B-C-D-E-E', 'A-B-D-F-G-E')

I am trying to use the TraMineR ways to do so, but as I already have a sequene object I am unable to do so using the traMineR package, see below:

http://traminer.unige.ch/doc/seqient.html

Any suggestions? Thanks


Solution

  • Perhaps:

    library(TraMineR)
    A<- c( 'A-A-A-A', 'A-B-C-D-E-E', 'A-B-D-F-G-E')
    B <- as.data.frame(A)
    actcal.seq <- seqdef(B)
    ## Summarize and plot histogram
    ## of within sequence entropy
    actcal.ient <- seqient(actcal.seq)
    summary(actcal.ient)
    hist(actcal.seq)
    

    enter image description here

    UPDATE: Per OP's request, adding Entropy to original data:

     cbind(B, actcal.ient)
    #              A   Entropy
    #[1]     A-A-A-A 0.0000000
    #[2] A-B-C-D-E-E 0.8020465
    #[3] A-B-D-F-G-E 0.9207822