Search code examples
rmagick

magick image library writing incorrect header to PDF files?


I am using the very useful magick library to read and annotate PDF files, and overlay an image on the result. I can generate a PDF file that looks as I would expect it to look. However, when I open the file, the header, which I would expect to read something like %PDF-1.7, reads ‰PNG like this.

PDF file not showing expected header

It looks to me as if magick is looking at the most recent operation, which is image_composite for a PNG file, and using this for the header. If so, is this a bug? The PDF file that is output appears otherwise well-formed, so it doesn't seem to be causing problems, but I am curious. The following code should enable the issue to be reproduced.

require(magick)
require(pdftools)

pdf_file <- "https://web.archive.org/web/20140624182842/http://www.gnupdf.org/images/d/db/Hello.pdf"
image_file <- "https://upload.wikimedia.org/wikipedia/commons/thumb/8/87/PDF_file_icon.svg/200px-PDF_file_icon.svg.png"

my_image <- image_read(image_file,density = 300)
pdfimage <- image_read_pdf(pdf_file,density = 300)

pdfimage2 <- image_annotate(pdfimage, "test",
                          location = "+400+700", style = "normal", weight = 400, 
                          size=42)

pdfimage3 <- image_composite(pdfimage2,my_image,operator="atop",
                             offset = "+100+100")

image_write(pdfimage3, path = "C:/temp/test.pdf", density = 300, flatten = TRUE)

Solution

  • I have held off from answering this because the solution is embarrassingly obvious. In retrospect, I just assumed that, because I used image_read_pdf it should and would save in PDF format. What I needed to do was specify it explicitly. Adding a format = "pdf" argument to the image_write call achieved that.

    image_write(pdfimage3, path = "C:/temp/test.pdf", density = 300, format = "pdf", flatten = TRUE)
    

    This results in a well-formed PDF. Problem solved. Lesson learned.

    PDF file with proper header