Search code examples
javarevalrserve

RserveException: eval failed Syntax error


I have an R Function that Removes all html data from an html page. It works when I run it in R But when I Run it through Rserve it produces error :

Exception in thread "main" org.rosuda.REngine.Rserve.RserveException: eval failed, request status: R parser: syntax error

at org.rosuda.REngine.Rserve.RConnection.eval(RConnection.java:234)
at CereScope_Data.main(CereScope_Data.java:80)

Java Eval Where I get the error :

REXP lstrRemoveHtml = cobjConn.eval("RemoveHtml('" + lstrRawData + "')");

My R Function: rawdata is an HTML page

RemoveHtml <- function(rawdata) {
  
  library("tm")
  
  ## Convering Data To UTF-8 Format
  ## Creating Corpus
  Encoding(rawdata) <- "latin1"
  docs <- Corpus(VectorSource(iconv(rawdata, from = "latin1", to = "UTF-8", sub = "")))
  
  toSpace <- content_transformer(function(x , pattern) gsub(pattern, " ", x))
  
  docs <- gsub("[^\\b]*(<style).*?(</style>)", " ", docs)
  docs <- Corpus(VectorSource(gsub("[^\\b]*(<script).*?(</script>)", " ", docs)))
  docs <- tm_map(docs, toSpace, "<.*?>")
  docs <- tm_map(docs, toSpace, "(//).*?[^\n]*")
  docs <- tm_map(docs, toSpace, "/")
  docs <- tm_map(docs, toSpace, "\\\\t")
  docs <- tm_map(docs, toSpace, "\\\\n")
  docs <- tm_map(docs, toSpace, "\\\\")
  docs <- tm_map(docs, toSpace, "@")
  docs <- tm_map(docs, toSpace, "\\|")
  
  docs <- tm_map(docs, toSpace, "\\\"")
  docs <- tm_map(docs, toSpace, ",")
  RemoveHtmlDocs <- tm_map(docs, stripWhitespace)
  
  return(as.character(RemoveHtmlDocs)[1])
}

Update - Things I tried already

  1. Escaping characters which may cause problems such as Single and Double Quotes and Backslashes
  2. I also tried assigning whole data to an R variable through eval and then running the function

New Update - Question Solved

  1. Escaping characters were causing problems such as Single and Double Quotes and Backslashes
  2. Another line which was no longer necessary was causing the problem as I didn't comment or remove it.

Thanks All!! : ) Check My Answer For Description!! : )


Solution

  • The Escaping Characters was the issue. To solve this problem I Escaped Escapes And Quotes. I created This Method to make it simpler:

    public static String Regexer(String Data) {
        String RegexedData = Data.replaceAll("\\\\", "\\\\\\\\").replaceAll("'", "\\\\'").replaceAll("\"", "\\\\\"");
        return (RegexedData);
    }
    

    I Escaped the Escaped characters again in the above function so that they are escaped in R functions also.

    Tip : Don't Forget To Convert REXP to a Java variable. : )