Search code examples
rtext-miningword-frequencyterm-document-matrix

Term frequency matrix


I have a string like this:

m<-"abcdabcdbcadacbddabcc..."

I would like to generate a matrix like this:

enter image description here

How can I do that in r?


Solution

  • This gives what I believe you're after:

    m <- "abcdabcdbcadacbddabcc"
    
    library(qdap)
    
    chars <- unique(unlist(strsplit(m, "")))
    terms <- paste2(expand.grid(rep(list(chars), 3)), sep="")
    t(counts(termco(m, match.list=sort(terms)))[, -c(1:2)])
    

    Output:

        1
    aaa 0
    aab 0
    aac 0
    aad 0
    aba 0
    .
    .
    .
    dcc 0
    dcd 0
    dda 1
    ddb 0
    ddc 0
    ddd 0