I'm trying to write a script that simplifies the process of producing a clean corpus from a vector or data frame for text mining and NLP. However, my script produces an error when I run it. My script is as follows:
quick_clean <- function(data, Vector = TRUE, removeNumbers = TRUE, removePunctuation = TRUE,
stop.words = NULL, ...) {
if(Vector == TRUE) {
source <- VectorSource(data)
} else {
source <- DataframeSource(data)
}
corp <- VCorpus(source)
corp <- tm_map(corp, stripWhitespace)
if(removePunctuation == TRUE) {
corp <- tm_map(corp, removePunctuation)
}
if(removeNumbers == TRUE) {
corp <- tm_map(corp, removeNumbers)
}
if(is.null(stop.words)) {
return(corp)
} else {
corp <- tm_map(corp, removeWords, c(stopwords("en"), stop.words))
}
corp
}
When I run it, I get the following error:
Error in get(as.character(FUN), mode = "function", envir = envir) :
object 'FUN' of mode 'function' was not found
I ran the traceback, but I'm not really sure how to use this information:
7. get(as.character(FUN), mode = "function", envir = envir)
6. match.fun(FUN)
5. lapply(X, FUN, ...)
4. tm_parLapply(content(x), FUN, ...)
3. tm_map.VCorpus(corp, removePunctuation)
2. tm_map(corp, removePunctuation)
1. quick_clean(swift_vec)
I also ran Debug and got the following...again, I'm not sure how to use this info:
Error in get(as.character(FUN), mode = "function", envir = envir) :
object 'FUN' of mode 'function' was not found
Called from: get(as.character(FUN), mode = "function", envir = envir)
Browse[1]>
What am I doing wrong here?
Let's examine the traceback
pile from the bottom:
quick_clean
corp <- tm_map(corp, removePunctuation)
line, luckily you only have onetm_map
the function itself is calling the method tm_map.VCorpus
, as your corp object is of class Vcorpus
and tm_map is a wrapper for different methodstm_parLapply
etc...From the time you hit a reliable function in traceback
it's usually not so useful to go much further, it means that the input you gave to the functions isn't good.
We learnt that you gave a Vcorpus
object as a first parameter, so this one seems to be ok, though we may check later if its format is not problematic.
But let's check the other parameter, removePunctuation
, the doc (?tm_map
) says it requires a function, if you use debug
, debugonce
or browser
(look them up). you'll see that their boolean at the time you execute the line.
And they're boolean because you named your function parameters just like those functions.
So rename your function parameters and hopefully it will run fine :).
here's how you may use browser
:
define this function (spot the added line)
quick_clean <- function(data, Vector = TRUE, removeNumbers = TRUE, removePunctuation = TRUE,
stop.words = NULL, ...) {
if(Vector == TRUE) {
source <- VectorSource(data)
} else {
source <- DataframeSource(data)
}
corp <- VCorpus(source)
corp <- tm_map(corp, stripWhitespace)
if(removePunctuation == TRUE) {
browser() # <----------------------------------------- here !
corp <- tm_map(corp, removePunctuation)
}
if(removeNumbers == TRUE) {
corp <- tm_map(corp, removeNumbers)
}
if(is.null(stop.words)) {
return(corp)
} else {
corp <- tm_map(corp, removeWords, c(stopwords("en"), stop.words))
}
corp
}
Execute the line that triggered the error
type class(corp)
to confirm what we already know
type class(removePunctuation)
Ooops, it's a boolean.
Type Q
or the escape key to get out of the browser.
debug
is like browser
, but starts at the first line of the function.