Search code examples
rggplot2stringrggtextmagick-r-package

Convert a column with text files into separate images in R


I have a dataset with a column storing hundreds of writing samples. My goal is to export each writing sample into a separate image. The code shared in this answer was promising given that it allowed me to convert text into an image.

df <- data.frame(
  ID = 1:2,
  Sample = c("Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. \r\r\nUt enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.", "Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.")
)

#create empty image of size 1000 px x 1000 px
h<-1000
w<-1000

#open new file for output
png("out.png", width=w, height=h)
par(mar=c(0,0,0,10), xpd=NA, mgp=c(0,0,0), oma=c(0,0,0,0), ann=F)
plot.new()
plot.window(0:1, 0:1)

#save output from your analysis
output<-df$Sample[[1]]

#add text
text(0, 0.9, output, adj = c(0,0), cex = 3) 

#close image
dev.off()

However, I have several problems and concerns with the result:

  1. The text extends beyond the right margin of the image. By contrast, I would like to start a new line every time the text reaches the right margin.
  2. Given that the writing samples have different lengths, I would like the size of the image to be tailored to each writing sample so that there is no unnecessary empty space.
  3. The current code already takes care of "\r\r\n" as a signal to start a new paragraph in the image. I simply want to stress this feature so that a solution with different code would still keep that into consideration.
  4. Of course, this solution only works for one sample at a time. It would be nice to generate a function or for-loop to automatically create all the images at once.

Lastly, the current code makes use of base R, but a solution using ggplot2 (such as here) or stringr (such as with the combination of the cat() and str_wrap() functions) would also be great. Thanks so much to anyone who will decide to help!


Solution

  • Here's a brute force solution.

    It loops through each writing sample and tries a bunch of sizes using ggtext::geom_textbox(). For each, it creates a 1000x1200 png (note, +100 on top and +100 on bottom vs. end goal).

    Then it checks the top 100 rows of pixels to see if any have text in them -- if so the font is too big and it deletes and moves on to the next. Once a font fits, it crops to 1000x1000 and moves on to the next writing sample.

    library(ggplot2)
    library(ggtext)
    library(magick)
    for(i in 1:nrow(df)) {
      for(s in seq(12, 4, by = -0.25)) {
        tec <- paste0("image_", i, "_size-", s, ".png")
        p <- ggplot(df[i,], aes(label = Sample)) +
          ggtext::geom_textbox(x = 0.5, y = 0.5, width = 1, size = s, box.size = 0) +
          theme_void()
        ggsave(tec, p, width = 1000, height = 1200, units = "px", bg = "white")
        r <- image_read(tec) %>%
          image_crop("1000x100") %>%
          as.raster() 
        overlap_pct <- mean(r == "#ffffffff")
        if(overlap_pct < 1) {
          unlink(tec)
        } else(
          {
            image_read(tec) %>%
              image_crop("1000x1000", gravity = "center") %>%
              image_write(path = tec, format = "png")
            break
          }
        )
      }
    }