I have FormXobject under my page1->Resource -> Xobjects-> Fm0, Fm1, Fm2..
So it is not direct content stream which is not available under contents->contentstream. So I want to move the content stream of from Fm0->Contentstream to page1-> contents-> contentstream.
When we moved content stream like this we parallelly we have to transfer or copy Fm0 related Resources to page level resource.
1.Content stream need to copy under page level contents.
2.Color space objects need to copy under page1->Resource->Colorspace.
3.ExtGState objects need to copy under page1->Resource->ExtGState.
4.properties need to copy under page1->Resource (here need to create that entirely)
I tried some code
private static PDDocument parseFormXobject(PDDocument document, Integer pg_ind) throws IOException {
List<Object> tokens1 = (List<Object>) (getTokens(document, pg_ind)).get(pg_ind);
PDStream newContents = new PDStream(document);
OutputStream out = newContents.createOutputStream(COSName.FLATE_DECODE);
ContentStreamWriter writer = new ContentStreamWriter(out);
PDPage pageinner = document.getPage(pg_ind);
PDResources resources = pageinner.getResources();
PDResources new_resources = new PDResources();
new_resources = resources;
COSDictionary fntdict = new COSDictionary();
COSDictionary imgdict = new COSDictionary();
COSDictionary extgsdict = new COSDictionary();
COSDictionary colordict = new COSDictionary();
COSDictionary pattern = new COSDictionary();
int img_count = 0;
for (COSName xObjectName : resources.getXObjectNames()) {
PDXObject xObject = resources.getXObject(xObjectName);
if (xObject instanceof PDFormXObject
&& tokens1.toString().contains(xObjectName.toString()) ) {
PDFStreamParser parser = new PDFStreamParser(((PDFormXObject) xObject).getContentStream());
parser.parse();
List<Object> tokens3 = parser.getTokens();
int ind =0;
//isTextContains will check is there any Tj operators or there or not
if (isTextContains(tokens3)){
for (COSName colorname :((PDFormXObject) xObject).getResources().getColorSpaceNames())
{
COSName new_name = COSName.getPDFName(colorname.getName());
PDColorSpace pdcolor = ((PDFormXObject) xObject).getResources().getColorSpace(colorname);
colordict.setItem(new_name,pdcolor);
}
for (COSName fontName :((PDFormXObject) xObject).getResources().getFontNames() )
{
COSName new_name = COSName.getPDFName(fontName.getName());
PDFont font =((PDFormXObject) xObject).getResources().getFont(fontName);
font.getCOSObject().setItem(COSName.NAME, new_name);
fntdict.setItem(new_name,font);
}
for (COSName ExtGSName :((PDFormXObject) xObject).getResources().getExtGStateNames() )
{
COSName new_name = COSName.getPDFName(ExtGSName.getName());
PDExtendedGraphicsState ExtGState =((PDFormXObject) xObject).getResources().getExtGState(ExtGSName);
ExtGState.getCOSObject().setItem(COSName.NAME, new_name);
extgsdict.setItem(new_name,ExtGState);
}
imgdict.setItem(xObjectName, xObject);
for (COSName Imgname :((PDFormXObject) xObject).getResources().getXObjectNames() )
{
COSName new_name = COSName.getPDFName(Imgname.getName());
xObject.getCOSObject().setItem(COSName.NAME, new_name);
PDXObject img =((PDFormXObject) xObject).getResources().getXObject(Imgname);
imgdict.setItem(new_name, img);
}
for (COSName paternname :((PDFormXObject) xObject).getResources().getPatternNames() )
{
COSName new_name = COSName.getPDFName(paternname.getName());
PDAbstractPattern pat = ((PDFormXObject) xObject).getResources().getPattern(paternname);
pat.getCOSObject().setItem(COSName.NAME, new_name);
pattern.setItem(new_name,pat);
}
for (int k=0; k< tokens1.size(); k++) {
if ( ((tokens1.get(k) instanceof Operator) && ((Operator)tokens1.get(k)).getName().toString().equals("Do"))
&& ((COSName)tokens1.get(k-1)).getName().toString().equals(xObjectName.getName().toString()) ) {
tokens1.remove(k-1);
tokens1.remove(k-1);
tokens1.add(k-1, Operator.getOperator("q"));
if(((PDFormXObject) xObject).getMatrix() != null) {
tokens1.add(k, new COSFloat(((PDFormXObject) xObject).getMatrix().getScaleX()));
tokens1.add(k + 1, new COSFloat(((PDFormXObject) xObject).getMatrix().getShearY()));
tokens1.add(k + 2, new COSFloat(((PDFormXObject) xObject).getMatrix().getShearX()));
tokens1.add(k + 3, new COSFloat(((PDFormXObject) xObject).getMatrix().getScaleY()));
tokens1.add(k + 4, new COSFloat(((PDFormXObject) xObject).getMatrix().getTranslateX()));
tokens1.add(k + 5, new COSFloat(((PDFormXObject) xObject).getMatrix().getTranslateY()));
tokens1.add(k + 6, Operator.getOperator("cm"));
tokens1.add(k+7, Operator.getOperator("Q"));
ind =k+7;
}else{
tokens1.add(k, Operator.getOperator("Q"));
ind =k;
}
break;
}
}
for (int k=0; k< tokens3.size(); k++) {
if ( (tokens3.size() > k+1) && (tokens3.get(k+1) instanceof Operator) && (((Operator)tokens3.get(k+1)).getName().toString().equals("Do")
|| ((Operator)tokens3.get(k+1)).getName().toString().equals("gs")
|| ((Operator)tokens3.get(k+1)).getName().toString().equals("cs")
|| ((Operator)tokens3.get(k+1)).getName().toString().equals("CS")) ) {
COSName new_name = COSName.getPDFName( ((COSName) tokens3.get(k)).getName() );
tokens1.add(ind+k, new_name );
}else if ( (tokens3.size() > k+2) && (tokens3.get(k+2) instanceof Operator)
&& ((Operator)tokens3.get(k+2)).getName().toString().equals("Tf") ) {
COSName new_name = COSName.getPDFName( ((COSName) tokens3.get(k)).getName() );
tokens1.add(ind+k, new_name );
}
else
tokens1.add(ind+k,tokens3.get(k));
}
img_count +=1;
}else {
imgdict.setItem(xObjectName, xObject);
img_count +=1;
}
}else
imgdict.setItem(xObjectName, xObject);
}
for (COSName fontName :new_resources.getFontNames() )
{
PDFont font =new_resources.getFont(fontName);
fntdict.setItem(fontName,font);
}
for (COSName ExtGSName :new_resources.getExtGStateNames() )
{
PDExtendedGraphicsState extg =new_resources.getExtGState(ExtGSName);
extgsdict.setItem(ExtGSName,extg);
}
for (COSName colorname :new_resources.getColorSpaceNames() )
{
PDColorSpace color =new_resources.getColorSpace(colorname);
colordict.setItem(colorname,color);
}
for (COSName patern :new_resources.getPatternNames() )
{
PDAbstractPattern pat =new_resources.getPattern(patern);
pattern.setItem(patern,pat);
}
resources.getCOSObject().setItem(COSName.EXT_G_STATE,extgsdict);
resources.getCOSObject().setItem(COSName.FONT,fntdict);
resources.getCOSObject().setItem(COSName.XOBJECT,imgdict);
resources.getCOSObject().setItem(COSName.COLORSPACE, colordict);
resources.getCOSObject().setItem(COSName.PATTERN, pattern);
writer.writeTokens(tokens1);
out.close();
document.getPage(pg_ind).setContents(newContents);
document.getPage(pg_ind).setResources(resources);
return document;
}
private static JSONObject getTokens(PDDocument oldDocument, Integer pageIndex) throws IOException {
// TODO Auto- it will return the tokens of pdf
JSONObject oldDocumentTokens = new JSONObject();
PDPage pg = oldDocument.getPage(pageIndex);
PDFStreamParser parser = new PDFStreamParser(pg);
parser.parse();
List<Object> tokens = PDFUtils.removeTokens(parser.getTokens());
oldDocumentTokens.put(pageIndex, tokens);
return oldDocumentTokens;
}
private static boolean isTextContains(List<Object> tokens3) {
for (int k=0; k< tokens3.size(); k++) {
if (tokens3.get(k) instanceof Operator) {
Operator op = (Operator) tokens3.get(k);
if(op.getName().equals("BT"))
return true;
}
}
return false;
}
But I am unable to get Exact Page graphics. I am losing something.
There are multiple issues, some in details, some in the concept.
When you draw an XObject, graphics state changes in that XObject don't change your current graphics state. To make sure this still is true after you copied the XObject instructions into your page content stream, you have to wrap that block into a save-graphics-state/restore-graphics-state envelope (q ... Q). You can do that by adding these two lines
tokens1.add(ind++, Operator.getOperator("q"));
tokens1.add(ind, Operator.getOperator("Q"));
right before your instruction copying loop
for (int k=0; k< tokens3.size(); k++) {
...
}
You assume the coordinate system in the XObject equals that of the page. It doesn't necessarily. XObjects may have a Matrix entry denoting the transformation to apply.
You don't limit the area of what is drawn by the XObject instructions. But XObjects have a BBox entry denoting the box to clip the outputs to.
XObjects may also have an OC entry denoting their optional content membership. Such a membership needs to be transformed into an equivalent optional content tagging.
XObjects can also refer to the structural parent tree via their StructParent or StructParents entry. To keep structural integrity of the document, you may have to considerably update the structure tree.
XObjects may contain a Group entry indicating that its content shall be treated as a group. In particular in case of Transparency Groups this results in a different behavior of transparency related features than for the same instructions copied into the page content.
Unless you completely analyze the effects of each bit of content drawn with some transparency and from case to case rewrite the instructions drawing it, copying the instructions from the XObject to the page content stream will result in substantial differences in the displayed content.
Your code assumes that a XObject is used exactly once in the page content streams. This need not be the case, it can also be used more often or not at all.
In a comment you asked for references. Actually it's all in the PDF specification ISO 32000, already in the publicly available ISO 32000-1:
8.10 Form XObjects
A form XObject is a PDF content stream that is a self-contained description of any sequence of graphics objects (including path objects, text objects, and sampled images). A form XObject may be painted multiple times—either on several pages or at several locations on the same page—and produces the same results each time, subject only to the graphics state at the time it is invoked.
Thus, any number of usages on a given page is possible
When the Do operator is applied to a form XObject, a conforming reader shall perform the following tasks:
a) Saves the current graphics state, as if by invoking the q operator (see 8.4.4, "Graphics State Operators")
b) Concatenates the matrix from the form dictionary’s Matrix entry with the current transformation matrix (CTM)
c) Clips according to the form dictionary’s BBox entry
d) Paints the graphics objects specified in the form’s content stream
e) Restores the saved graphics state, as if by invoking the Q operator (see 8.4.4, "Graphics State Operators")
When copying into the page content stream, therefore, you should equivalently use a q/Q envelope and respect the Matrix and BBox entries.
8.11.3.3 Optional Content in XObjects and Annotations
In addition to marked content within content streams, form XObjects and image XObjects (see 8.8, "External Objects") and annotations (see 12.5, "Annotations") may contain an OC entry, which shall be an optional content group or an optional content membership dictionary.
A form or image XObject's visibility shall be determined by the state of the group or those of the groups referenced by the membership dictionary in conjunction with its P (or VE) entry, along with the current visibility state in the context in which the XObject is invoked (that is, whether objects are visible in the contents stream at the place where the Do operation occurred).
Thus, respect this optional content information when copying to the page content.
11.6.6 Transparency Group XObjects
A transparency group is represented in PDF as a special type of group XObject (see “Group XObjects”) called a transparency group XObject. A group XObject is in turn a type of form XObject, distinguished by the presence of a Group entry in its form dictionary (see “Form Dictionaries”). The value of this entry is a subsidiary group attributes dictionary defining the properties of the group. The format and meaning of the dictionary’s contents shall be determined by its group subtype, which is specified by the dictionary’s S entry. The entries for a transparency group (subtype Transparency) are shown in Table 147.
...
Annex L
So copying from transparency groups may change the appearance substantially.
14.7.4.3 PDF Objects as Content Items
When a structure element’s content includes an entire PDF object, such as an XObject or an annotation, that is associated with a page but not directly included in the page’s content stream, the object shall be identified in the structure element’s K entry by an object reference dictionary (see Table 325).
...
14.7.4.4 Finding Structure Elements from Content Items
...
To locate the relevant parent tree entry, each object or content stream that is represented in the tree shall contain a special dictionary entry, StructParent or StructParents (see Table 326). Depending on the type of content item, this entry may appear in the page object of a page containing marked-content sequences, in the stream dictionary of a form or image XObject, in an annotation dictionary, or in any other type of object dictionary that is included as a content item in a structure element.
This and more information from the same chapter should indicate clearly that structure information after copying from XObject to page content must be overhauled.