Search code examples
rbookdown

Different behavior between pdf_document and bookdown::pdf_document2 when using compareGroups


I am having an issue when knit-ing documents using bookdown::pdf_document2 that don't appear when using the standard pdf_document.

Specifically, I am using the compareGroups library and the export2md function to output comparison tables such as the one shown below:

enter image description here

This is successful when I use output:pdf_document. However, The table is not properly created when I use output: bookdown:pdf_document2.enter image description here

There are clearly differences in the tex files and I am manually able to copy the table from the tex outputed by pdf_document to pdf_document2. Does anyone have any thoughts on how to get bookdown to correctly create the table? I have create a repo with my bug found here for more details: https://github.com/vitallish/bookdown-bug


Solution

  • Overview

    bookdown::pdf_document2() is different from rmarkdwon::pdf_document(), the former set $opts_knit$kable.force.latex to TRUE while the latter leaves that to default value (FALSE).

    check .md file

    I think that the process from .md to .tex should be the same, and the difference in .tex files might due to the difference in .md files. So I run the following code to keep the intermediate .md files.

    rmarkdown::render('pdf_document.Rmd', clean = FALSE)
    file.remove('pdf_document.utf8.md');
    
    rmarkdown::render('pdf_document2.Rmd', clean = FALSE)
    file.remove('pdf_document2.utf8.md');  
    

    pdf_document.knit.md

    Table: Summary descriptives table by groups of `Sex'
    
    Var                                               Male   N=1101    Female   N=1193    p.overall 
    -----------------------------------------------  ---------------  -----------------  -----------
    Recruitment year:                                                                       0.506   
        1995                       206 (18.7%)       225 (18.9%)                
        2000                       390 (35.4%)       396 (33.2%)                
        2005                       505 (45.9%)       572 (47.9%)                
    Age                                                54.8 (11.1)       54.7 (11.0)        0.840   
    Smoking status:                                                                        <0.001   
    &nbsp;&nbsp;&nbsp;&nbsp;Never smoker               301 (28.1%)       900 (77.5%)                
    &nbsp;&nbsp;&nbsp;&nbsp;Current or former < 1y     410 (38.3%)       183 (15.7%)                
    &nbsp;&nbsp;&nbsp;&nbsp;Former >= 1y               360 (33.6%)       79 (6.80%)                 
    Systolic blood pressure                            134 (18.9)        129 (21.2)        <0.001   
    Diastolic blood pressure                           81.7 (10.2)       77.8 (10.5)       <0.001  
    

    pdf2_document.knit.md

    \begin{table}
    
    \caption{(\#tab:md-output)Summary descriptives table by groups of `Sex'}
    \centering
    \begin{tabular}[t]{l|c|c|c}
    \hline
    Var & Male   N=1101 & Female   N=1193 & p.overall\\
    \hline
    Recruitment year: &  &  & 0.506\\
    \hline
    \&nbsp;\&nbsp;\&nbsp;\&nbsp;1995 & 206 (18.7\%) & 225 (18.9\%) & \\
    \hline
    \&nbsp;\&nbsp;\&nbsp;\&nbsp;2000 & 390 (35.4\%) & 396 (33.2\%) & \\
    \hline
    \&nbsp;\&nbsp;\&nbsp;\&nbsp;2005 & 505 (45.9\%) & 572 (47.9\%) & \\
    \hline
    Age & 54.8 (11.1) & 54.7 (11.0) & 0.840\\
    \hline
    Smoking status: &  &  & <0.001\\
    \hline
    \&nbsp;\&nbsp;\&nbsp;\&nbsp;Never smoker & 301 (28.1\%) & 900 (77.5\%) & \\
    \hline
    \&nbsp;\&nbsp;\&nbsp;\&nbsp;Current or former < 1y & 410 (38.3\%) & 183 (15.7\%) & \\
    \hline
    \&nbsp;\&nbsp;\&nbsp;\&nbsp;Former >= 1y & 360 (33.6\%) & 79 (6.80\%) & \\
    \hline
    Systolic blood pressure & 134 (18.9) & 129 (21.2) & <0.001\\
    \hline
    Diastolic blood pressure & 81.7 (10.2) & 77.8 (10.5) & <0.001\\
    \hline
    \end{tabular}
    \end{table}
    

    That explains why you see different appearance in the pdf output.

    explore

    To further explore the reason,

    > pdf1 <- rmarkdown::pdf_document()
    > pdf2 <- bookdown::pdf_document2()
    > all.equal(pdf, pdf2)
    [1] "Length mismatch: comparison on first 11 components"                                      
    [2] "Component “knitr”: Component “opts_knit”: target is NULL, current is list"               
    [3] "Component “pandoc”: Component “args”: Lengths (8, 12) differ (string compare on first 8)"
    [4] "Component “pandoc”: Component “args”: 8 string mismatches"                               
    [5] "Component “pandoc”: Component “ext”: target is NULL, current is character"               
    [6] "Component “pre_processor”: target, current do not match when deparsed"                   
    [7] "Component “post_processor”: target is NULL, current is function"     
    

    Since knitr convert Rmarkdown to pandoc markdown, I guess $knitr cause the difference in .md files.

    > all.equal(pdf$knitr, pdf2$knitr)
    [1] "Component “opts_knit”: target is NULL, current is list"
    
    > pdf2$knitr$opts_knit
    $bookdown.internal.label
    [1] TRUE
    
    $kable.force.latex
    [1] TRUE
    

    kable is a function to output table, so $knitr$opts_knit$kable.force.latex might to the root reason.

    verify

    To test my assumption,

    pdf3 <- pdf2
    pdf3$knitr$opts_knit$kable.force.latex = FALSE
    rmarkdown::render('pdf_document3.Rmd', clean = FALSE, output_format = pdf3)
    file.remove('pdf_document3.utf8.md')
    

    pdf_document3.knit.md

    Var                                               Male   N=1101    Female   N=1193    p.overall 
    -----------------------------------------------  ---------------  -----------------  -----------
    Recruitment year:                                                                       0.506   
    &nbsp;&nbsp;&nbsp;&nbsp;1995                       206 (18.7%)       225 (18.9%)                
    &nbsp;&nbsp;&nbsp;&nbsp;2000                       390 (35.4%)       396 (33.2%)                
    &nbsp;&nbsp;&nbsp;&nbsp;2005                       505 (45.9%)       572 (47.9%)                
    Age                                                54.8 (11.1)       54.7 (11.0)        0.840   
    Smoking status:                                                                        <0.001   
    &nbsp;&nbsp;&nbsp;&nbsp;Never smoker               301 (28.1%)       900 (77.5%)                
    &nbsp;&nbsp;&nbsp;&nbsp;Current or former < 1y     410 (38.3%)       183 (15.7%)                
    &nbsp;&nbsp;&nbsp;&nbsp;Former >= 1y               360 (33.6%)       79 (6.80%)                 
    Systolic blood pressure                            134 (18.9)        129 (21.2)        <0.001   
    Diastolic blood pressure                           81.7 (10.2)       77.8 (10.5)       <0.001 
    

    Wa oh!

    Advanced

    Actually compareGroups::export2md use knitr::kable as the working horse,

    > compareGroups::export2md
    function (x, which.table = "descr", nmax = TRUE, header.labels = c(), 
        caption = NULL, ...) 
    {
        if (!inherits(x, "createTable")) 
            stop("x must be of class 'createTable'")
        ...
        if (ww %in% c(1)) {
            ...
            table1 <- table1[-1, , drop = FALSE]
            return(knitr::kable(table1, align = align, row.names = FALSE, 
                caption = caption[1]))
        }
        if (ww %in% c(2)) {
            table2 <- prepare(x, nmax = nmax, c())[[2]]
            ...
            return(knitr::kable(table2, align = align, row.names = FALSE, 
                caption = caption[2]))
        }
    }
    

    which use kable.force.latex as an internal option to adjust its output. If your browse the GitHub repository of knitr, you can find the following code in the R/utils.R file

    kable = function(
      x, format, digits = getOption('digits'), row.names = NA, col.names = NA,
      align, caption = NULL, format.args = list(), escape = TRUE, ...
    ) {
    
      # determine the table format
      if (missing(format) || is.null(format)) format = getOption('knitr.table.format')
      if (is.null(format)) format = if (is.null(pandoc_to())) switch(
        out_format() %n% 'markdown',
        latex = 'latex', listings = 'latex', sweave = 'latex',
        html = 'html', markdown = 'markdown', rst = 'rst',
        stop('table format not implemented yet!')
      ) else if (isTRUE(opts_knit$get('kable.force.latex')) && is_latex_output()) {
        # force LaTeX table because Pandoc's longtable may not work well with floats
        # http://tex.stackexchange.com/q/276699/9128
        'latex'
      } else 'pandoc'
      if (is.function(format)) format = format()
      ...
      structure(res, format = format, class = 'knitr_kable')
    }
    

    Conclusion

    $knitr$opts_knit$kable.force.latex = TRUE cause bookdown::pdf_document2() to insert latex code in the .md file, while rmarkdown::pdf_document() preserves the markdown code, which leaves pandoc the chance to give a pretty table.

    I don't think this is a bug. Yihui Xie (the author of bookdown) might have some special reason to do this. And bookdown::pdf_document2() never need to be the same as rmarkdown::pdf_document().