I am trying to alter the redaction annotation to change the underlying text that gets burned into a PDF when you apply redactions. In Acrobat, you can set up a collection of "redaction codes" that can be used to identify why you are marking something as redacted. My goal is to overwrite what was selected by the user with a system defined value. The code will be ran prior to the redactions being applied.
In my attempts, I have discovered that the "preview" that is available in Acrobat products when hovering your cursor over a redact box is unique to Acrobat, and most other viewers won't show the preview. It also seems like the preview is maintained separately from the actual redaction that is applied. I don't need to alter the text that is shown in the preview, just what is shown after redactions are applied.
I have added a bounty of 150 reputation, as I don't think that I will be able to work out a solution on my own. My original question specified iText7, as that was the library that got me the closest in my own attempts. While I would prefer to use iText7, I will also consider solutions using other libraries that I can reasonably access (I do have a small budget that I could use to purchase another library, if I need to).
I've kept my original question and the follow-up with what I've personally tried below. I appreciate any help offered.
If you need a sample to test with, this DropBox folder has a file called 01 - Original.pdf
that you can use as the source document. The desired result is to be able to change the text that appears when applying redactions from "Original Overlay Text" to any other value, such as "New Text".
I am trying to alter the text contained within every redaction annotation in a PDF, using iText7
. The PdfRedactAnnotation
object has a method called SetOverlayText()
that looks like it should do what I want. So, I wrote a method that opens a PDF, loops through the pages, then loops through the annotations on each page, and checks if an annotation is a PdfRedactAnnotation
. If it is, it calls SetOverlayText()
.
When debugging and looking at the annotation properties, I can see that the OverlayText
has definitely changed. When I open the file and check the overlay text by hovering over a redaction marking with my cursor, however, the original overlay text is still there.
Additionally, if I apply the redactions, the original overlay text is what gets burned into the page.
However, when I right-click on the annotation (before applying redactions), the overlay text immediately gets updated to the new text:
At this point, when I apply redactions, it's the new text that is burned into the PDF.
Is there any way that I can trigger the Redaction Annotation update programmatically, without having to open and right-click on every one? I've included my code below. Thank you for any advice anyone might be able to offer.
PdfDocument pdfDoc = new PdfDocument(new PdfReader(@"C:\temp\Test - Original.pdf"), new PdfWriter(@"C:\temp\Test - Output.pdf"));
Document doc = new Document(pdfDoc);
int pageCount = pdfDoc.GetNumberOfPages();
for (int i = 1; i <= pageCount; i++)
{
var annotations = pdfDoc.GetPage(i).GetAnnotations();
foreach(var annotation in annotations)
{
if (annotation is PdfRedactAnnotation)
{
PdfRedactAnnotation redact = (PdfRedactAnnotation)annotation;
redact.SetOverlayText(new PdfString("New Text"));
}
}
}
doc.Close();
As @mkl's answer points out, the PDF Redact Annotation Specification clarifies the underlying redact annotation DOM entries. OverlayText is just one part of the equation. If you use OverlayText then there must be a DA element defined (DA is a string that provides formatting info for the OverlayText). Finally, if RO is defined, it supersedes pretty much all of the other independent display entries.
My testing document was made using Acrobat DC Pro, by manually adding a redaction in Acrobat. Doing this results in a Redact annotation with all of the above entries set. Copies of my test documents can be found in this DropBox folder.
(Side note: In my original question, I mention hovering over the redaction's red rectangle in order to preview what the applied redaction will look like... After testing in multiple browsers and other PDF Viewers like Foxit Reader, it looks like the function to 'preview' what the redaction will look like when applied by hovering your mouse over the red outline is only supported in Acrobat products. All other viewers tested will only show the red border, with nothing occurring when you hover your cursor over it. The black rectangles shown above can only be viewed in other programs after redactions have been applied.
Additional testing has shown that the hover-over preview is maintained separately from the redaction details itself, with Acrobat operating to try to keep the hover-over details in-sync with the underlying annotation. It is best to ignore the hover-over preview when testing, and refer to the results after applying redactions.)
@mkl's recommendation to remove the RO entry in order to try to let the OverlayText take priority was a good idea, but it unfortunately didn't work. There was no notable difference from my original results.
After poking around in iText7's PdfRedactAnnotation, I found that the following methods all result in a reference to the Redact object's RO entry:
PdfRedactAnnotation redact = (PdfRedactAnnotation)annotation;
redact.GetRolloverAppearanceObject();
redact.GetRedactionRolloverAppearance();
redact.GetPdfObject().Get(PdfName.RO);
redact.GetAppearanceDictionary().Get(PdfName.R);
(I confirmed they are in fact the exact same reference by checking the equality comparator. As reference types, they all returned true
when tested using ==
).
On further testing, I have concluded that the RO property must have a copy of the same OverlayText stored internally. If you have two redactions with different original values, you can "copy" the RO element from one redaction to another:
PdfObject ro = firstRedact.GetPdfObject().Get(PdfName.RO);
secondRedact.GetPdfObject().Put(PdfName.RO, ro);
If you do this and apply redactions, the "overlay text" from the first redact will have replaced the "overlay text" in the second. The other RO element values are also copied (such as BBox, which defines the black rectangle's dimensions)... but at least those elements can be adjusted.
The problem remains that the iText7 PdfObject of RO has 7 sub elements, and none of them or their descendant elements appear to expose the text that I'm trying to change.
My final test was whether I could copy RO elements from one PDF to another (so that I could use a second source PDF with an annotation with the desired RO "overlay text" already configured), but it looks like indirect objects don't like being .Put() into other documents.
So now, I'm left with trying to either find a way to access/alter the text stored away in RO, or to clone a preconfigured RO from another document.
The OverlayText entry of redaction annotations is specified as
Key Type Value OverlayText text string (Optional) A text string specifying the overlay text that should be drawn over the redacted region after the affected content has been removed. This entry is ignored if the RO entry is present. (ISO 32000-2, Table 195 — Additional entries specific to a redaction annotation)
Maybe in your source PDF the redaction annotation has a RO taking precedence.
Furthermore, that table says this concerning the DA entry:
Key Type Value DA byte string (Required if OverlayText is present, ignored otherwise) The appearance string that shall be used in formatting the overlay text when it is drawn after the affected content has been removed (see 12.7.4.3, "Variable text"). This entry is ignored if the RO entry is present.
If you use OverlayText, therefore, you also have to make sure the DA default appearance string is set. Did you?
The RO entry in the same table is specified as
Key Type Value RO stream (Optional) A form XObject specifying the overlay appearance for this redaction annotation. After this redaction is applied and the affected content has been removed, the overlay appearance should be drawn such that its origin lines up with the lower-left corner of the annotation rectangle. This form XObject is not necessarily related to other annotation appearances, and may or may not be present in the AP dictionary. This entry takes precedence over the IC, OverlayText, DA, and Q entries.
According to the details posted above, one obvious option to proceed is to create a redaction overlay XObject (RO) for the changed redaction annotations. You can do this by replacing your
if (annotation is PdfRedactAnnotation)
{
PdfRedactAnnotation redact = (PdfRedactAnnotation)annotation;
redact.SetOverlayText(new PdfString("New Text"));
}
by
if (annotation is PdfRedactAnnotation)
{
PdfRedactAnnotation redact = (PdfRedactAnnotation)annotation;
redact.SetOverlayText(new PdfString("New Text"));
Rectangle rectangle = redact.GetRectangle().ToRectangle();
PdfStream stream = redact.GetRedactRolloverAppearance();
if (stream != null)
{
rectangle = stream.GetAsArray(PdfName.BBox).ToRectangle();
}
PdfFormXObject redactionOverlay = new PdfFormXObject(rectangle);
redactionOverlay.GetPdfObject().Put(PdfName.Matrix, new PdfArray(new double[] { 1, 0, 0, 1, -rectangle.GetX(), -rectangle.GetY() }));
using (Canvas canvas = new Canvas(redactionOverlay, pdfDocument))
{
PdfCanvas pdfCanvas = canvas.GetPdfCanvas();
pdfCanvas.SetFillColorGray(0);
pdfCanvas.Rectangle(rectangle);
pdfCanvas.Fill();
pdfCanvas.SetFillColorGray(1);
canvas.Add(new Paragraph("New Text"));
}
stream = redactionOverlay.GetPdfObject();
redact.SetRolloverAppearance(stream);
redact.SetDownAppearance(stream);
redact.SetRedactRolloverAppearance(stream);
}
The result after redacting in Acrobat:
By adapting the used fill colors and the paragraph style you can make the appearance correspond more closely to the Adobe Acrobat generated appearances (or you alternatively can generate a look completely of your own design).
Beware, I only have a fairly old Adobe Acrobat version available, v9.5, so probably current versions don't accept a redaction appearance as generated above or at least apply it differently.