Search code examples
r-markdownbookdownbibtex

In Rmarkdown, is there a way to create a .bib file for only those keys cited in a document?


I have written a manuscript using bookdown in Rstudio for a specific project that cites references from a bibtex file. This is a single .bib file that I use for many documents, so it is outside my project folder and contains many references that aren't cited in the present manuscript. To make this easier to share, I would like to make a smaller .bib file showing only those references I actually cite in the manuscript.

Other questions have addressed how to do this for:

  1. pure Tex using the citations given in the .aux file. I can generate an .aux file by setting options(tinytex.clean = FALSE), but it doesn't contain any citations.
  2. pandoc/markdown, but I have no idea how one would apply this to Rmarkdown.

Does anyone know of a way to do this for an Rmarkdown document? Thanks!

I am using this YAML header and knitting within Rstudio:

output:
  bookdown::pdf_book:
    keep_tex: yes

Full sessionInfo:

> sessionInfo()
R version 3.6.3 (2020-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.5 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C               LC_TIME=en_GB.UTF-8       
 [4] LC_COLLATE=en_GB.UTF-8     LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
 [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] compiler_3.6.3  bookdown_0.20   htmltools_0.4.0 tools_3.6.3     yaml_2.2.0     
 [6] Rcpp_1.0.3      rmarkdown_2.3   knitr_1.29      xfun_0.15       digest_0.6.25  
[11] packrat_0.5.0   rlang_0.4.7     evaluate_0.14  

Solution

  • Since you write in .Rmd you can use the following R-function to clean up your bib-file:

    library(stringr)
    
    clean_bib <- function(input_file, input_bib, output_bib){
      lines <- paste(readLines(input_file), collapse = "")
      entries <- unique(str_match_all(lines, "@([a-zA-Z0-9]+)[,\\. \\?\\!\\]\\;]")[[1]][, 2])
    
      bib <- paste(readLines(input_bib), collapse = "\n")
      bib <- unlist(strsplit(bib, "\n@"))
    
      output <- sapply(entries, grep, bib, value = T)
      output <- paste("@", output, sep = "")
    
      writeLines(unlist(output), output_bib)
    }
    # now call the function
    clean_bib(...)
    

    Just call it in the setup chunk.

    What does the function do? It first searches all citations in the input-file, meaning a string starting with @, containing letters and numbers and ending with a comma, dot, question mark, exclamation mark, space or ] -- adjust this to your needs.

    Then it constructs a new bib file only containing these entries.