Here is my code:
library(RCurl)
library(TraMineR)
library(PST)
x <- getURL("https://gist.githubusercontent.com/aronlindberg/08228977353bf6dc2edb3ec121f54a29/raw/c2539d06771317c5f4c8d3a2052a73fc485a09c6/challenge_level.csv")
data <- read.csv(text = x)
# Load and transform data
data <- read.table("thread_level.csv", sep = ",", header = F, stringsAsFactors = F)
data.seq <- seqdef(data[2:nrow(data),2:ncol(data)], missing = "NA", right = "*")
# Make a tree
S1 <- pstree(data.seq, ymin = 0.05, L = 6, lik = TRUE, with.missing = F)
logLik(S1)
For some reason, it refuses to return a Log-likelihood value? Why is this the case? How can I get a Log-likelihood value?
You have bad values for the missing
and right
arguments in your seqdef
command which then causes an error in pstree
.
With
data.seq <- seqdef(data[2:nrow(data),2:ncol(data)], missing = NA, right= NA, nr = "*")
# Make a tree
S1 <- pstree(data.seq, ymin = 0.05, L = 6, lik = TRUE, with.missing = TRUE)
logLik(S1)
we get
'log Lik.' -31011.32 (df=47179)
Note that since you have missing values I have set with.missing = TRUE
in the pstree
command.
===============
To ignore the right missings, set right='DEL'
in seqdef
.
seq <- seqdef(data[2:nrow(data),2:ncol(data)], missing = NA, right= "DEL")
S2 <- pstree(seq, ymin = 0.05, L = 6, lik = TRUE, with.missing = F)
logLik(S2)
I don't know what PST computes as logLik(S2)
and why we get here an NA
. The likelihood to generate the data with the tree S2
can be obtained by means of the predict
function that returns the likelihood of each sequence in the data. The log likelihood of the data should then be
sum(log(predict(S2, seq)))
which gives
[>] 984 sequence(s) - min/max length: 1/32
[!] sequences have unequal lengths
[>] max. context length: L=6
[>] found 1020 distinct context(s)
[>] total time: 0.588 secs
[1] -4925.79