Search code examples
rnlpldapyldavis

Count the number of tokens in a Documenttermmatrix


I have a question to a Documenttermmatrix. I would like to use the "LDAVIS" package in R. To visualize my results of the LDA algorithm I need to calculate the number of tokens of every document. I don´t have the text corpus for the considered DTM. Does anyone know how I can calculate the amount of tokens for every Document. The output as a list with the document name and his amount of tokens would be the perfect solution.

Kind Regards, Tom


Solution

  • You can use slam::row_sums. This calculates the row_sums of a document term matrix without first transforming the dtm into a matrix. This function comes from the slam package which is installed when you install the tm package.

    count_tokens <- slam::row_sums(dtm_goes_here)
    
    # if you want a list
    count_tokens_list <- as.list(slam::row_sums(dtm_goes_here))