I have a Play Framework application (Play 2.8, Scala 2.13 and Java 8) that creates JPG thumbnails from uploaded PDF files, using Apache PDFBox 2. The thumbnails are created upon request, then are cached on the file system. However, when one user tries to display the gallery with many PDFs for which no thumbnail is cached, a bunch of thumbnails are created simultaneously, and the server crashes with OutMemoryError (5 or 6 simultaneous tasks seems to be enough). The server restarts automatically and is available again after a few dozens of seconds, but the thumbnails that were being created are corrupt and I have to face many unavailabilities.
PDFBox is configured to use temp files, but the memory out occurs when the thumbnail image is being rendered.
The server has only 2 GB of RAM available. The uploaded PDFS are around 1 MB each and the generated thumbnails are around 100 KB (72 DPI; size is about 500×1000 px). Can I fix this problem without increasing the heap size? Ideally, Play should be able to queue these memory-intensive requests automatically, but I can live by restricting the number of simultaneous memory-intensive tasks manually, somehow…
The easiest thing probably would be to use a dedicated ExecutionContext
with an underlying fixed size thread pool for generating the thumbnails.
import java.nio.file.Path
import java.util.concurrent.Executors
import scala.concurrent.{ ExecutionContext, Future }
object RenderPDF {
implicit val ec : ExecutionContext = ExecutionContext.fromExecutorService(Executors.newFixedThreadPool(3 /* adjust */))
def thumbnail(pdf: Path) : Future[Path] = Future {
... // call PDFbox
}
}
You can use that in your action handler to load off the generation of the thumbnails.
It would be probably even better to pre-render the thumbnails in case you are dealing with uploaded PDFs, since that would avoid the problem of suddenly having to render dozens of PDFs if a user opens the gallery.