Search code examples
xmlrmarkup

How to handle two step conversion in XML using R?


I am parsing one xml using R (XML package). XML has following markup.

 <  &lt;
 >  &gt;
 &  &amp;

Input Text: 
       My age is &amp;gt; 65 years years. 

       output: My age is gt;65 years.

Expected output: My age is >65 years.

  How to get 2-step(1) for converting &amp; into & (2)  &gt; into '>' ? 

Solution

  • You could write a function like this

    batchgsub <- function(patternmatrix, string) {
        for (i in 1:nrow(patternmatrix)) {
            p = patternmatrix[i,1]
            r = patternmatrix[i,2]
            string <- gsub(p,r,string)
        }
        return(string)
    }
    

    and specify your patterns to be replaced like this

    > pm
         [,1]    [,2]
    [1,] "&amp;" "&" 
    [2,] "&gt;"  ">" 
    

    Then you can "chain" the replacements as many times as you want.

    > s <- "My age is &amp;gt; 65 years."
    > batchgsub(pm, s)
    [1] "My age is > 65 years."