HTML-formatted hyperlinks not preserved in bookdown PDF

I have several html-formatted URLs in my bookdown .Rmd files that disappear in the generated PDF. It appears that the link is being ignored and the PDF only displays the text that should connect the link.

For example, <a href="https://www.cygwin.com" target="_blank">Cygwin</a> simply appears as Cygwin (no hyperlink).

But when the website matches the displayed text, then it works fine (e.g.: <a href="https://www.cygwin.com" target="_blank">https://www.cygwin.com</a>), presumably because the text is the link itself.

Is there a way to have bookdown preserve these html hyperlinks in the PDF output?

I am running the following to generate the PDF in R Studio:

    render_book("index.Rmd", "bookdown::pdf_book")

And the top of index.Rmd looks like this:

    title: "My Title"
    site: bookdown::bookdown_site
    documentclass: book
    link-citations: yes
    output:
      bookdown::pdf_book:
        pandoc_args: [--wrap=none]
    urlcolor: blue

Solution

Pandoc, and in extension R Markdown, just keeps the raw HTML of the links around. The raw HTML chunks are output to formats supporting HTML (like epub), but not for LaTeX (which is used to generate the PDF). Pandoc will just parse the link's content, which is the reason why it seems to work if your link text is a URL.

The simplest solution would of course be to use Markdown syntax for links instead, which is just as expressive as HTML: [Cygwin](https://www.cygwin.com){target="_blank"}. However, if that is not an option, then things get a bit hacky.

Here's a method to still parse those links. It uses a Lua filter to convert the raw HTML into a proper link. Just safe the following script as parse-html-links.lua into the same directory as your Rmd file and add '--lua-filter=parse-html-links.lua' to your list of pandoc_args.

local elements_in_link = {}
local link_start
local link_end

Inline = function (el)
  if el.t == 'RawInline' and el.format:match'html.*' then
    if el.text:match'<a ' then
      link_start = el.text
      return {}
    end
    if el.text:match'</a' then
      link_end = el.text
      local link = pandoc.read(link_start .. link_end, 'html').blocks[1].content[1]
      link.content = elements_in_link
      -- reset
      elements_in_link, link_start, link_end = {}, nil, nil
      return link
    end
  end
  -- collect link content
  if link_start then
    table.insert(elements_in_link, el)
    return {}
  end
  -- keep original element
  return nil
end