Search code examples
markov-chainspsttraminersequence-analysis

Getting log-likelihood from probabilistic suffix tree


Here is my code:

library(RCurl)
library(TraMineR)
library(PST)

x <- getURL("https://gist.githubusercontent.com/aronlindberg/08228977353bf6dc2edb3ec121f54a29/raw/c2539d06771317c5f4c8d3a2052a73fc485a09c6/challenge_level.csv")
data <- read.csv(text = x)

# Load and transform data
data <- read.table("thread_level.csv", sep = ",", header = F, stringsAsFactors = F)

data.seq <- seqdef(data[2:nrow(data),2:ncol(data)], missing = "NA", right = "*")

# Make a tree
S1 <- pstree(data.seq, ymin = 0.05, L = 6, lik = TRUE, with.missing = F)
logLik(S1)

For some reason, it refuses to return a Log-likelihood value? Why is this the case? How can I get a Log-likelihood value?


Solution

  • You have bad values for the missing and right arguments in your seqdef command which then causes an error in pstree.

    With

    data.seq <- seqdef(data[2:nrow(data),2:ncol(data)], missing = NA, right= NA, nr = "*")
    # Make a tree
    S1 <- pstree(data.seq, ymin = 0.05, L = 6, lik = TRUE, with.missing = TRUE)
    logLik(S1)
    

    we get

    'log Lik.' -31011.32 (df=47179)
    

    Note that since you have missing values I have set with.missing = TRUE in the pstree command.

    ===============

    To ignore the right missings, set right='DEL' in seqdef.

    seq <- seqdef(data[2:nrow(data),2:ncol(data)], missing = NA, right= "DEL")
    S2 <- pstree(seq, ymin = 0.05, L = 6, lik = TRUE, with.missing = F)
    logLik(S2)
    

    I don't know what PST computes as logLik(S2) and why we get here an NA. The likelihood to generate the data with the tree S2 can be obtained by means of the predict function that returns the likelihood of each sequence in the data. The log likelihood of the data should then be

    sum(log(predict(S2, seq)))
    

    which gives

     [>] 984 sequence(s) - min/max length: 1/32
     [!] sequences have unequal lengths
     [>] max. context length: L=6
     [>] found 1020 distinct context(s)
     [>] total time: 0.588 secs
    [1] -4925.79