I have a PostScript Sample that illustrates creating a Form. If I convert the PostScript to PDF, I can enumerate the FormXObject quite easily but how do I get access to the content? For example
/SForm <<
/FormType 1 % all forms are FormType 1
/Matrix [ 1 0 0 1 0 0] % no scaling or translating
/BBox [ 0 -10 100 100 ] % hack - should really calculate the width of the string
% and the height of the font allowing for descenders etc
/PaintProc {
pop
0 0 moveto % assume that the translate has set the current point
(XObject String) show
0 24 moveto
(Line Two) show
} bind
>> def
Translates to
7 0 obj
<</Type/XObject/Subtype/Form/FormType 1/BBox[0 -10 100 100]/Resources 6 0 R/Matrix[1 0 0 1 0 0]/Length 98>>
stream
/GS1 gs
BT
/F1 1 Tf
11 0 0 11 0 0 Tm
0 g
0 Tc
0 Tw
(XObject String)Tj
0 2.1818 TD
(Line Two)Tj
ET
endstream
endobj
How can I obtain the information between stream
and endstream
. I had assumed that this would have been a relatively simple operation but I've not managed to retrieve the content. If I use something like the following (in my Groovy Code) then I get the information between << >> (the dictionary) but not the actual PDF operators that do that actual marking (from the PostScript PaintProc).
Iterable<COSName> names = pdDoc.getPage(pageNum).getResources().getXObjectNames();
for (COSName name:names){
def xObject = pdResources.getXObject(name)
if (xObject instanceof PDFormXObject) {
println xObject.getContentStream().dump()
}
}
Actually, it would suit my purpose to get the content between the BT
and ET
operators. The main focus is to find the "definition" of the FormXObject along with its content and not really to explore where the FormXObject is used in the page content.
Obviously I have overlooked something but what? Thanks in advance.
xObject.getContents()
returns you an InputStream
from which you can read the stream contents.