I am trying to resolve my own question regarding changing the tf-idf weighting function in the tm
package: https://stackoverflow.com/questions/15045313/changing-tf-idf-weight-function-weight-not-by-occurrences-of-term-but-by-numbe
In doing so, I'm looking at the weightTfIdf
function which includes the following code on m
, a TermDocumentMatrix.
cs <- col_sums(m)
and
rs <- row_sums(m)
But I can't find any documentation for the functions row_sums
or col_sums
; and when I try to write my own weighting function using them, I get an error: Error in weighting(x) : could not find function "col_sums"
Where are these functions defined?
I've pasted the complete function information from R
below:
function (m, normalize = TRUE)
{
isDTM <- inherits(m, "DocumentTermMatrix")
if (isDTM)
m <- t(m)
if (normalize) {
cs <- col_sums(m)
if (any(cs == 0))
warning("empty document(s): ", paste(Docs(m)[cs ==
0], collapse = " "))
names(cs) <- seq_len(nDocs(m))
m$v <- m$v/cs[m$j]
}
rs <- row_sums(m > 0)
if (any(rs == 0))
warning("unreferenced term(s): ", paste(Terms(m)[rs ==
0], collapse = " "))
lnrs <- log2(nDocs(m)/rs)
lnrs[!is.finite(lnrs)] <- 0
m <- m * lnrs
attr(m, "Weighting") <- c(sprintf("%s%s", "term frequency - inverse document frequency",
if (normalize) " (normalized)" else ""), "tf-idf")
if (isDTM)
t(m)
else m
}
<environment: namespace:tm>
attr(,"class")
[1] "WeightFunction" "function"
attr(,"Name")
[1] "term frequency - inverse document frequency"
attr(,"Acronym")
[1] "tf-idf"
The functions you're looking for are in the slam
package. Since slam
is only imported and isn't a dependency it takes a little bit of work to view the documentation. Here is an example session of how one might go about figuring that out and looking at the documentation.
> # I'm assuming you loaded tm first
> library(tm)
> # See if we can view the code
> col_sums
Error: object 'col_sums' not found
> # Use getAnywhere to grab the function even if the function is
> # in a namespace that isn't exported
> getAnywhere("col_sums")
A single object matching ‘col_sums’ was found
It was found in the following places
namespace:slam
with value
function (x, na.rm = FALSE, dims = 1, ...)
UseMethod("col_sums")
<environment: namespace:slam>
> # So the function is in the slam package
> slam::col_sums
function (x, na.rm = FALSE, dims = 1, ...)
UseMethod("col_sums")
<environment: namespace:slam>
> # We can tell help to look in the slam package now that we know
> # where the function is from
> help(col_sums, package = "slam")
> # alternatively
> library(slam)
> ?col_sums
> # If we want to view the actual code for col_sums we need to
> # do a little work too
> methods("col_sums")
[1] col_sums.default* col_sums.simple_triplet_matrix*
Non-visible functions are asterisked
> # We probably want the default version? Otherwise change to the other one
> getAnywhere("col_sums.default")
A single object matching ‘col_sums.default’ was found
It was found in the following places
registered S3 method for col_sums from namespace slam
namespace:slam
with value
function (x, na.rm = FALSE, dims = 1, ...)
base:::colSums(x, na.rm, dims, ...)
<environment: namespace:slam>
So the col_sums
function is just a wrapper for the base function colSums.