I'm working with two PDFs that are not identical, but are to have the same operation applied to them.
I'm working with some preliminary code using Aspose to apply the same image to both PDF files using exactly the same code. I'm not inclined to blame the library right away as it is capable of generating the correct output when operating on the Office 365 document:
// note: Anyone familiar with the PDF format itself should have no
// issues inferring the low-level operations being performed here...
fun Page.writeImage(image: InputStream) {
val imageName = resources.images.add(image.inMemory());
val rectangle = rectangleFromTopLeft(0.0, 0.0, 400.0, 200.0);
val matrix = rectangle.defaultMatrix();
contents.add(listOf(
GSave(),
ConcatenateMatrix(matrix),
Do(imageName),
GRestore()
));
}
Regardless of which file I provide, the coordinates for the rectangle and matrix in both these cases remain the same.
For the Office 365 derived PDF, the image is applied to the page as I specify. Where things get weird is when I open the Google Docs derived PDF, the image is applied flipped vertically and at the bottom of the page!
I would love for any PDF experts to perhaps be able to explain to me what's going on here. My initial suspicion is that some prior state or operation in the Google Docs PDF is in effect prior to my image operation.
That said, I'm not familiar enough (yet!) with the PDF spec to pick it out...
I don't know who you should blame, but there is a straightforward reason for the difference.
The Google Docs document has a page stream that begins with:
1 0 0 -1 0 792 cm
This basically does the vertical flipping of the page, the 792 is to compensate and move things back on the page - it should be the height of the page in points.
It does not encapsulate this in a q ... Q
pair to do a "save ... restore", which means this matrix is now set for the remainder of all that follows on the page. As you might already know, the PDF specification does not provide a way to reset the page matrix, you can only append to it.
When you add content to the page at the end, your content now inherits this matrix, which is why you see it flipped and at the bottom.
The Microsoft file does not do this and as a result it's handled properly. In this case the matrix remains the identity matrix and you end up with your content where you expected it.
How to fix this? Well, if your library doesn't provide a way to know what the current page matrix is, that's going to be very difficult. It can of course be solved "just for this document" by applying the inverse matrix to cancel out the stupid thing Google did here, but I can imagine this is not the ultimate way to handle this you'd be looking for.