I have an R Function that Removes all html data from an html page. It works when I run it in R But when I Run it through Rserve it produces error :
Exception in thread "main" org.rosuda.REngine.Rserve.RserveException: eval failed, request status: R parser: syntax error
at org.rosuda.REngine.Rserve.RConnection.eval(RConnection.java:234) at CereScope_Data.main(CereScope_Data.java:80)
Java Eval Where I get the error :
REXP lstrRemoveHtml = cobjConn.eval("RemoveHtml('" + lstrRawData + "')");
My R Function: rawdata is an HTML page
RemoveHtml <- function(rawdata) {
library("tm")
## Convering Data To UTF-8 Format
## Creating Corpus
Encoding(rawdata) <- "latin1"
docs <- Corpus(VectorSource(iconv(rawdata, from = "latin1", to = "UTF-8", sub = "")))
toSpace <- content_transformer(function(x , pattern) gsub(pattern, " ", x))
docs <- gsub("[^\\b]*(<style).*?(</style>)", " ", docs)
docs <- Corpus(VectorSource(gsub("[^\\b]*(<script).*?(</script>)", " ", docs)))
docs <- tm_map(docs, toSpace, "<.*?>")
docs <- tm_map(docs, toSpace, "(//).*?[^\n]*")
docs <- tm_map(docs, toSpace, "/")
docs <- tm_map(docs, toSpace, "\\\\t")
docs <- tm_map(docs, toSpace, "\\\\n")
docs <- tm_map(docs, toSpace, "\\\\")
docs <- tm_map(docs, toSpace, "@")
docs <- tm_map(docs, toSpace, "\\|")
docs <- tm_map(docs, toSpace, "\\\"")
docs <- tm_map(docs, toSpace, ",")
RemoveHtmlDocs <- tm_map(docs, stripWhitespace)
return(as.character(RemoveHtmlDocs)[1])
}
Update - Things I tried already
- Escaping characters which may cause problems such as Single and Double Quotes and Backslashes
- I also tried assigning whole data to an R variable through eval and then running the function
New Update - Question Solved
- Escaping characters were causing problems such as Single and Double Quotes and Backslashes
- Another line which was no longer necessary was causing the problem as I didn't comment or remove it.
Thanks All!! : ) Check My Answer For Description!! : )
The Escaping Characters was the issue. To solve this problem I Escaped Escapes And Quotes. I created This Method to make it simpler:
public static String Regexer(String Data) {
String RegexedData = Data.replaceAll("\\\\", "\\\\\\\\").replaceAll("'", "\\\\'").replaceAll("\"", "\\\\\"");
return (RegexedData);
}
I Escaped the Escaped characters again in the above function so that they are escaped in R functions also.
Tip : Don't Forget To Convert REXP to a Java variable. : )