Search code examples
rtm

Can't find documentation for R function row_sums and col_sums


I am trying to resolve my own question regarding changing the tf-idf weighting function in the tm package: https://stackoverflow.com/questions/15045313/changing-tf-idf-weight-function-weight-not-by-occurrences-of-term-but-by-numbe

In doing so, I'm looking at the weightTfIdf function which includes the following code on m, a TermDocumentMatrix.

cs <- col_sums(m)

and

rs <- row_sums(m)

But I can't find any documentation for the functions row_sums or col_sums; and when I try to write my own weighting function using them, I get an error: Error in weighting(x) : could not find function "col_sums"

Where are these functions defined?

I've pasted the complete function information from R below:

function (m, normalize = TRUE) 
{
    isDTM <- inherits(m, "DocumentTermMatrix")
    if (isDTM) 
        m <- t(m)
    if (normalize) {
        cs <- col_sums(m)
        if (any(cs == 0)) 
            warning("empty document(s): ", paste(Docs(m)[cs == 
                0], collapse = " "))
        names(cs) <- seq_len(nDocs(m))
        m$v <- m$v/cs[m$j]
    }
    rs <- row_sums(m > 0)
    if (any(rs == 0)) 
        warning("unreferenced term(s): ", paste(Terms(m)[rs == 
            0], collapse = " "))
    lnrs <- log2(nDocs(m)/rs)
    lnrs[!is.finite(lnrs)] <- 0
    m <- m * lnrs
    attr(m, "Weighting") <- c(sprintf("%s%s", "term frequency - inverse document frequency", 
        if (normalize) " (normalized)" else ""), "tf-idf")
    if (isDTM) 
        t(m)
    else m
}
<environment: namespace:tm>
attr(,"class")
[1] "WeightFunction" "function"      
attr(,"Name")
[1] "term frequency - inverse document frequency"
attr(,"Acronym")
[1] "tf-idf"

Solution

  • The functions you're looking for are in the slam package. Since slam is only imported and isn't a dependency it takes a little bit of work to view the documentation. Here is an example session of how one might go about figuring that out and looking at the documentation.

    > # I'm assuming you loaded tm first
    > library(tm)
    > # See if we can view the code
    > col_sums
    Error: object 'col_sums' not found
    > # Use getAnywhere to grab the function even if the function is 
    > # in a namespace that isn't exported
    > getAnywhere("col_sums")
    A single object matching ‘col_sums’ was found
    It was found in the following places
      namespace:slam
    with value
    
    function (x, na.rm = FALSE, dims = 1, ...) 
    UseMethod("col_sums")
    <environment: namespace:slam>
    > # So the function is in the slam package
    > slam::col_sums
    function (x, na.rm = FALSE, dims = 1, ...) 
    UseMethod("col_sums")
    <environment: namespace:slam>
    > # We can tell help to look in the slam package now that we know
    > # where the function is from    
    > help(col_sums, package = "slam")
    > # alternatively
    > library(slam)
    > ?col_sums
    > # If we want to view the actual code for col_sums we need to 
    > # do a little work too
    > methods("col_sums")
    [1] col_sums.default*               col_sums.simple_triplet_matrix*
    
       Non-visible functions are asterisked
    > # We probably want the default version?  Otherwise change to the other one
    > getAnywhere("col_sums.default")
    A single object matching ‘col_sums.default’ was found
    It was found in the following places
      registered S3 method for col_sums from namespace slam
      namespace:slam
    with value
    
    function (x, na.rm = FALSE, dims = 1, ...) 
    base:::colSums(x, na.rm, dims, ...)
    <environment: namespace:slam>
    

    So the col_sums function is just a wrapper for the base function colSums.