I would like to monitor the basic quality of the figures produced in R on individual pages such as byte size of each page,... I can now do only quality assurance of average pages, see the following chapter about it. I think there must be something builtin for the task than average measures.
Code which produces 4 pages in Rplots.pdf
where I would like to know the byte size of each page in an output here; any other statistics of the page outputs is also welcome;
you can get the basic memory monitoring by objects here but I would like it to correspond to the outputs in PDF
# https://stat.ethz.ch/R-manual/R-devel/library/graphics/html/plot.html
require(stats) # for lowess, rpois, rnorm
plot(cars)
lines(lowess(cars))
plot(sin, -pi, 2*pi) # see ?plot.function
## Discrete Distribution Plot:
plot(table(rpois(100, 5)), type = "h", col = "red", lwd = 10,
main = "rpois(100, lambda = 5)")
## Simple quantiles/ECDF, see ecdf() {library(stats)} for a better one:
plot(x <- sort(rnorm(47)), type = "s", main = "plot(x, type = \"s\")")
points(x, cex = .5, col = "dark red")
## TODO summarise here the byte size of figures in the figures (1-4)
# Output: Rplot.pdf where 4 pages; I want to know the size of each page in bytes
I am currently doing the basic quality assurance in command-line but would like to move some of it to R, to observe bugs faster.
Expected output: byte size, for instance like 4th column of ls -l
Limitations
Code
filename <- "main.pdf"
filesize <- file.size(filename)
# http://unix.stackexchange.com/q/331175/16920
pages <- Rpoppler::PDF_info(filename)$Pages
# print page size (= filesize / pages)
pagesize <- filesize / pages
## data of example file
num 7350960
int 62
num 118564
Input: just any 62-pages document
Output: average individual page size (118564)
Output but you cannot change the input easily to your wanted PDF-file
files size_bytes
[1,] "./test_page_size_pdf/page01.pdf" "4,123,942"
[2,] "./test_page_size_pdf/page02.pdf" " 4,971"
[3,] "./test_page_size_pdf/page03.pdf" " 4,672"
[4,] "./test_page_size_pdf/page04.pdf" " 5,370"
Input: just any 64-pages document
Expected output: 67 (= 64 + 3) pages, not 4 analysed
R: 3.3.2
OS: Debian 8.5
Download and install the pdftk utility if it is not already on your system and then try one of the following alternatives this from within R.
1) It will return a data frame with the page file sizes in bytes and other information.
myfile <- "Rplots.pdf"
system(paste("pdftk", myfile, "burst"))
file.info(Sys.glob("pg_*.pdf"))
It will also generate a file doc_data.txt with some miscellaneous information that may or may not be of interest.
1a) This alternative will not generate any files. It will simply return the character sizes of the pages as a numeric vector.
myfile <- "Rplots.pdf"
pages <- as.numeric(read.dcf(pipe(paste("pdftk", myfile, "dump_data")))[, "NumberOfPages"])
cmds <- sprintf("pdftk %s cat %d output - | wc -c", myfile, seq_len(pages))
unname(sapply(cmds, function(cmd) scan(pipe(cmd), quiet = TRUE)))
The above should work if pdftk
and wc
are on your path. Note that on Windows you can find wc
in the Rtools distribution and is typically at "C:\\Rtools\\bin\\wc"
once Rtools is installed.
2) This alternative is similar to (1) but uses the animation package:
library(animation)
ani.options(pdftk = "/path/to/pdftk")
pdftk("Rplots.pdf", "burst", "pg_%04d.pdf", "")
file.info(Sys.glob("pg_*.pdf"))