Search code examples
rfunctiondebuggingtext-miningtraceback

How to use Traceback and debug to fix broken R code?


I'm trying to write a script that simplifies the process of producing a clean corpus from a vector or data frame for text mining and NLP. However, my script produces an error when I run it. My script is as follows:

  quick_clean <- function(data, Vector = TRUE, removeNumbers = TRUE, removePunctuation = TRUE, 
                     stop.words = NULL, ...) {
  if(Vector == TRUE) {
    source <- VectorSource(data)
  } else {
    source <- DataframeSource(data)
  }
  corp <- VCorpus(source)
  corp <- tm_map(corp, stripWhitespace)

  if(removePunctuation == TRUE) {
    corp <- tm_map(corp, removePunctuation)
  }
  if(removeNumbers == TRUE) {
    corp <- tm_map(corp, removeNumbers)
  }
  if(is.null(stop.words)) {
   return(corp)
  } else {
    corp <- tm_map(corp, removeWords, c(stopwords("en"), stop.words))
  }
  corp
}

When I run it, I get the following error:

Error in get(as.character(FUN), mode = "function", envir = envir) : 
object 'FUN' of mode 'function' was not found 

I ran the traceback, but I'm not really sure how to use this information:

7. get(as.character(FUN), mode = "function", envir = envir) 
6. match.fun(FUN) 
5. lapply(X, FUN, ...) 
4. tm_parLapply(content(x), FUN, ...) 
3. tm_map.VCorpus(corp, removePunctuation) 
2. tm_map(corp, removePunctuation) 
1. quick_clean(swift_vec)

I also ran Debug and got the following...again, I'm not sure how to use this info:

Error in get(as.character(FUN), mode = "function", envir = envir) : 
  object 'FUN' of mode 'function' was not found
Called from: get(as.character(FUN), mode = "function", envir = envir)
Browse[1]> 

What am I doing wrong here?


Solution

  • Let's examine the traceback pile from the bottom:

    1. your error is in quick_clean
    2. it's on the corp <- tm_map(corp, removePunctuation) line, luckily you only have one
    3. inside tm_map the function itself is calling the method tm_map.VCorpus, as your corp object is of class Vcorpus and tm_map is a wrapper for different methods
    4. This function itself is calling tm_parLapply etc...

    From the time you hit a reliable function in traceback it's usually not so useful to go much further, it means that the input you gave to the functions isn't good.

    We learnt that you gave a Vcorpus object as a first parameter, so this one seems to be ok, though we may check later if its format is not problematic.

    But let's check the other parameter, removePunctuation, the doc (?tm_map) says it requires a function, if you use debug, debugonce or browser (look them up). you'll see that their boolean at the time you execute the line.

    And they're boolean because you named your function parameters just like those functions.

    So rename your function parameters and hopefully it will run fine :).

    here's how you may use browser:

    define this function (spot the added line)

    quick_clean <- function(data, Vector = TRUE, removeNumbers = TRUE, removePunctuation = TRUE, 
                         stop.words = NULL, ...) {
      if(Vector == TRUE) {
        source <- VectorSource(data)
      } else {
        source <- DataframeSource(data)
      }
      corp <- VCorpus(source)
      corp <- tm_map(corp, stripWhitespace)
    
      if(removePunctuation == TRUE) {
        browser() # <----------------------------------------- here !
        corp <- tm_map(corp, removePunctuation)
      }
      if(removeNumbers == TRUE) {
        corp <- tm_map(corp, removeNumbers)
      }
      if(is.null(stop.words)) {
       return(corp)
      } else {
        corp <- tm_map(corp, removeWords, c(stopwords("en"), stop.words))
      }
      corp
    }
    

    Execute the line that triggered the error type class(corp) to confirm what we already know type class(removePunctuation) Ooops, it's a boolean. Type Q or the escape key to get out of the browser.

    debug is like browser, but starts at the first line of the function.