I have a code below to update existing markup (FreeText Callout) PDF using itext7 .NET. It does not appear correctly, but edit it in the bluebeam then it is shown the correct content as this image:
What am I missing?
public void UpdateMarkupCallout()
{
string inPDF = @"C:\in PDF.pdf";
string outPDF = @"C:\out PDF.pdf";
PdfDocument pdfDoc = new PdfDocument(new PdfReader(inPDF), new PdfWriter(outPDF));
int numberOfPages = pdfDoc.GetNumberOfPages();
for (int i = 1; i <= numberOfPages; i++)
{
PdfDictionary page = pdfDoc.GetPage(i).GetPdfObject();
PdfArray annotArray = page.GetAsArray(PdfName.Annots);
if (annotArray == null)
{
continue;
}
int size = annotArray.Size();
for (int x = 0; x < size; x++)
{
PdfDictionary curAnnot = annotArray.GetAsDictionary(x);
if (curAnnot.GetAsString(PdfName.Contents) != null)
{
string contents = curAnnot.GetAsString(PdfName.Contents).ToString();
if (contents != "" && contents.Contains("old content"))
{
curAnnot.Put(PdfName.Contents, new PdfString("new content"));
}
}
}
}
pdfDoc.Close();
}
The attached files: here
The answer is in Java but conversion to C# should be a matter of some easy letter case replacements and small tweaks.
Unfortunately, there is no silver bullet solution here, at least not without significant effort.
There are several issues here. First, you are only updating /Contents
key, while the annotations you are editing also have /RC
key which stands for A rich text string (see Adobe XML Architecture, XML Forms Architecture (XFA) Specification, version 3.3) that shall be used to generate the appearance of the annotation.
(ISO 32000).
On top of that, the appearance (/AP
entry) must be regenerated. as dictated by the specification. This is not what iText is capable of doing at the moment, so you will have to do it yourself.
You need to determine the area where the text must be drawn, taking /RD
, or rect diff entry into account.
To create your appearance you can use pdfHTML
add-on which would process the rich text representation from /RC
into layout elements that you can transfer to an XObject that you can put into /AP
.
With the code similar to the following:
PdfDocument pdfDocument = new PdfDocument(new PdfReader("in PDF.pdf"),
new PdfWriter("out PDF.pdf"));
int numberOfPages = pdfDocument.getNumberOfPages();
for (int i = 1; i <= numberOfPages; i++) {
PdfDictionary page = pdfDocument.getPage(i).getPdfObject();
PdfArray annotArray = page.getAsArray(PdfName.Annots);
if (annotArray == null) {
continue;
}
int size = annotArray.size();
for (int x = 0; x < size; x++) {
PdfDictionary curAnnot = annotArray.getAsDictionary(x);
if (curAnnot.getAsString(PdfName.Contents) != null) {
String contents = curAnnot.getAsString(PdfName.Contents).toString();
if (!contents.isEmpty() && contents.contains("old content")) //set layer for a FreeText with this content
{
curAnnot.put(PdfName.Contents, new PdfString("new content"));
String richText = curAnnot.getAsString(PdfName.RC).toUnicodeString();
Document document = Jsoup.parse(richText);
for (Element element : document.select("p")) {
element.html("new content");
}
curAnnot.put(PdfName.RC, new PdfString(document.body().outerHtml()));
Rectangle bbox = curAnnot.getAsRectangle(PdfName.Rect);
Rectangle textBbox = bbox.clone();
// left, top, right, bottom
PdfArray rectDiff = curAnnot.getAsArray(PdfName.RD);
if (rectDiff != null) {
textBbox.applyMargins(rectDiff.getAsNumber(1).floatValue(),
rectDiff.getAsNumber(2).floatValue(),
rectDiff.getAsNumber(3).floatValue(),
rectDiff.getAsNumber(0).floatValue(), false);
}
float leftRectDiff = rectDiff != null ? rectDiff.getAsNumber(0).floatValue() : 0;
float topRectDiff = rectDiff != null ? rectDiff.getAsNumber(1).floatValue() : 0;
List<IElement> elements = HtmlConverter.convertToElements(document.body().outerHtml());
PdfFormXObject appearance = new PdfFormXObject(
new Rectangle(0, 0, bbox.getWidth(), bbox.getHeight()));
Canvas canvas = new Canvas(new PdfCanvas(appearance, pdfDocument),
new Rectangle(leftRectDiff, topRectDiff, textBbox.getWidth(), textBbox.getHeight()));
canvas.setProperty(Property.RENDERING_MODE, RenderingMode.HTML_MODE);
for (IElement ele : elements) {
if (ele instanceof IBlockElement) {
canvas.add((IBlockElement) ele);
}
}
curAnnot.getAsDictionary(PdfName.AP).put(PdfName.N, appearance.getPdfObject());
}
}
}
}
pdfDocument.close();
You would get the result that looks like that:
You can see that the new text is displayed as expected, but the overall visual representation is far from our expectations - the background filling, the borders and the arrows are missing. So to generate the appearance properly you would have to further explore other PDF properties such as /CL
(arrow descriptors), /BS
(border style), /C
(background color) etc. This takes quite some time - reading up on the spec, parsing the relevant entries and applying those in your drawing operations. You can get some inspiration from PdfFormField class implementation.
In case you expect the text in your annotation to consist of only one line, be plain Latin text and in general the variability of the input documents is small, you can take the current appearance and assume that the text string will be written there in one chunk (it's the case for your input document).
Note that this is a hacky approach which is prone to many potential errors/bugs.
Sample code:
PdfDocument pdfDocument = new PdfDocument(new PdfReader("in PDF.pdf"),
new PdfWriter("out PDF.pdf"));
int numberOfPages = pdfDocument.getNumberOfPages();
for (int i = 1; i <= numberOfPages; i++) {
PdfDictionary page = pdfDocument.getPage(i).getPdfObject();
PdfArray annotArray = page.getAsArray(PdfName.Annots);
if (annotArray == null) {
continue;
}
int size = annotArray.size();
for (int x = 0; x < size; x++) {
PdfDictionary curAnnot = annotArray.getAsDictionary(x);
if (curAnnot.getAsString(PdfName.Contents) != null) {
String contents = curAnnot.getAsString(PdfName.Contents).toString();
String oldContent = "old content";
if (!contents.isEmpty() && contents.contains(oldContent)) {
String newContent = "new content";
curAnnot.put(PdfName.Contents, new PdfString(newContent));
String richText = curAnnot.getAsString(PdfName.RC).toUnicodeString();
Document document = Jsoup.parse(richText);
for (Element element : document.select("p")) {
element.html(newContent);
}
curAnnot.put(PdfName.RC, new PdfString(document.body().outerHtml()));
PdfStream currentAppearance = curAnnot.getAsDictionary(PdfName.AP).getAsStream(PdfName.N);
String currentBytes = new String(currentAppearance.getBytes(), StandardCharsets.UTF_8);
currentBytes = currentBytes.replace("(" + oldContent + ") Tj", "(" + newContent + ") Tj");
currentAppearance.setData(currentBytes.getBytes(StandardCharsets.UTF_8));
}
}
}
}
pdfDocument.close();
Visual result (as you can see, this is what we want):
Another way, which is not compliant with the PDF specification, is to remove /AP
entry whatsoever. You can do it in the very same loop with curAnnot.remove(PdfName.AP);
. Most major PDF viewers are going to regenerate the appearance themselves. However, my viewer generated the appearance in not the most appealing way:
So as you can see the result will depend on the PDF-viewer and this very well illustrates the reason why PDF specification mandates presence of /AP
. Once again, this way is not compliant with the PDF spec .