Search code examples
rregextidyversestringrtidytext

`str_replace_all()` on html output (from `huxtable()`)


My R code generates some html output which I'd like to make two very simple "find and replace" type adjustments to:

  • instead of R2 in the html, I'd like to replace with R<sup>2</sup>
  • intead of [number] *** in the html, I'd like to replace with [number]<sup>***</sup>, i.e removing the space and adding superscript.

I've been trying to do this with str_replace_all(). If I can solve my problem within the tidyverse that would be excellent.

For a reproducible example, I'll use mtcars to generate the html from huxtable::huxreg(), which is the same function that generates output in my real-life problem.

library(huxtable)
library(tidytext)

fit1 <- lm(mpg ~ disp, data = mtcars)

huxreg(fit1) %>% 
  quick_html()

which gives output that is the html version of this:

─────────────────────────────────────────────────
                                   (1)           
                        ─────────────────────────
  (Intercept)                        29.600 ***  
                                     (1.230)     
  disp                               -0.041 ***  
                                     (0.005)     
                        ─────────────────────────
  N                                  32          
  R2                                  0.718      
  logLik                            -82.105      
  AIC                               170.209      
─────────────────────────────────────────────────
  *** p < 0.001; ** p < 0.01; * p < 0.05.        

Column names: names, model1

So I tried to str_replace_all() on the R2 and the ***, but my output seems unchanged. Is there a simple way for me to make this replacement?

huxreg(fit1) %>% 
  quick_html() %>% 
  str_replace_all(pattern = "R2", replacement = "R<sup>2</sup>") %>% 
  str_replace_all(pattern = " ***", replacement = "<sup>***</sup>")

Solution

  • quick_html() returns NULL, not the text of the HTML it produces, which it saves to a file (huxtable-output.html, by default). You can read that file back in and run regex on it:

    library(huxtable)
    library(stringr)
    
    fit1 <- lm(mpg ~ disp, data = mtcars)
    filepath <- 'huxtable-output.html'
    
    huxreg(fit1) %>% 
        quick_html(file = filepath, open = FALSE)
    
    readLines(filepath) %>% 
        str_replace_all(pattern = "R2", replacement = "R<sup>2</sup>") %>% 
        str_replace_all(pattern = fixed(" ***"), replacement = "<sup>***</sup>") %>% 
        writeLines(filepath)
    
    # open file in browser
    browseURL(filepath)
    

    Or as @27ϕ9 mentions in the comment below, you can use huxtable::to_html() to avoid the reading back in:

    huxreg(fit1) %>% 
        to_html() %>%
        str_replace_all(pattern = "R2", replacement = "R<sup>2</sup>") %>% 
        str_replace_all(pattern = fixed(" ***"), replacement = "<sup>***</sup>") %>% 
        writeLines(filepath)
    

    Maybe better not to parse HTML with regex, though. Check out rvest and xml2 for more robust tooling designed for the purpose.