Search code examples
rcranhtml-encodexml2htmltools

Encode character to HTML in R, the CRAN way


Before voting for close as duplicate please ensure that it does actually answer my particular question here. Questions may look similar, but I haven't found an answer for mine. Thank you.


I am looking for a way to convert arbitrary scalar character into its HTML encoded form. I do not want just encode <, ", etc. but whole text.

So the text of form

"<abc at def.gh>"

be encoded as

"&#x3c;&#x61;&#x62;&#x63;&#x20;&#x61;&#x74;&#x20;&#x64;&#x65;&#x66;&#x2e;&#x67;&#x68;&#x3e;"

My goal is compatibility to how CRAN encodes maintainers email addresses. So the < should not be a &lt; but it should be &#x3c;. Similarly . should not be &period; but it should be &#x2e;.

To see it on CRAN you can visit CRAN page of any package, i.e. https://cran.r-project.org/package=curl, then "view source" and find Maintainer field there.

I am looking for a lightweight solution that will require as few dependencies as possible, it doesn't have to be fast.

For reference, an online tool to decode encoded string: https://onlineasciitools.com/convert-html-entities-to-ascii


Solution

  • Here is something quick (not thoroughly tested). It was inspired by another SO answer.

    foo <- function(x) {
      splitted <- strsplit(x, "")[[1]]
      intvalues <- as.hexmode(utf8ToInt(enc2utf8(x)))
      paste(paste0("&#x", intvalues, ";"), collapse = "")
    }
    
    all.equal(
      foo("<abc at def.gh>"),
      "&#x3c;&#x61;&#x62;&#x63;&#x20;&#x61;&#x74;&#x20;&#x64;&#x65;&#x66;&#x2e;&#x67;&#x68;&#x3e;"
    )
    # [1] TRUE