I would like to convert HTML character entities like
&
to &
or
>
to >
For Perl exists the package HTML::Entities which could do that, but I couldn't find something similar in R.
I also tried iconv()
but couldn't get satisfying results. Maybe there is also a way using the XML
package but I haven't figured it out yet.
Update: this answer is outdated. Please check the answer below based on the new xml2 pkg.
Try something along the lines of:
# load XML package
library(XML)
# Convenience function to convert html codes
html2txt <- function(str) {
xpathApply(htmlParse(str, asText=TRUE),
"//body//text()",
xmlValue)[[1]]
}
# html encoded string
( x <- paste("i", "s", "n", "&", "a", "p", "o", "s", ";", "t", sep = "") )
[1] "isn't"
# converted string
html2txt(x)
[1] "isn't"
UPDATE: Edited the html2txt() function so it applies to more situations