Search code examples
pdfclown

PDFClown Copy annotations and then manipulate them


I have the need to copy annotations from one PDF File to another. I have used the excellent PDFClown library but unable to manipulate things like color,rotation etc. Is this possible? I can see the baseobject information but also unsure how to manipulate that directly.

I can copy the appearance via cloning appearance but can't "edit" it.

Thanks in advance. Alex

P.S If Stephano the author is listeing ,is project dead?


Solution

  • On annotations in general and Callout annotations in particular

    I looked into it a bit, and I'm afraid there is not much you can deterministically manipulate for arbitrary inputs using high level methods. The reason is that there are numerous alternative ways to set the appearance of a Callout annotation and PDF Clown only supports the less prioritized ways with explicit high level methods. From high priority downwards

    • An explicit appearance in an AP stream. If it is given, it is used, ignoring whether this appearance looks like a Callout annotation at all, let alone like one defined by the other Callout properties.

      PDF Clown does not create an appearance for callout annotations from the other values yet, let alone update existing appearances to follow up to some specific attribute (e.g. Color) change. For ISO 32000-2 support, PDF Clown here will have to improve as appearance streams have become mandatory.

      If it exists, you can retrieve the appearance using getAppearance() but you only get a FormXObject with its low level drawing instructions, nothing Callout specific.

      One thing you can manipulate quite easily given a FormXObject, though, you can rotate or skew the appearance quite easily by setting its Matrix accordingly, e.g.

      annotation.getAppearance().getNormal().get(null).setMatrix(AffineTransform.getRotateInstance(100, 10));
      
    • A rich text string in the RC string or stream. Unless an appearance is given, the text in the Callout text box is generated from this rich text datum (rich text here uses a XHTML 1.0 subset for formatting).

      PDF Clown does not create a rich text representation of the Callout text yet, let alone update existing ones to follow up to some specific attribute (e.g. Color) change..

      If it exists, you can retrieve the rich text by low level access using getBaseDataObject().get(PdfName.RC), change this string or stream, and set it again using getBaseDataObject().put(PdfName.RC, ...). Similarly you can retrieve, manipulate, and set the rich text default style string using its name PdfName.DS instead.

    • A number of different settings for separate aspects used to build the Callout from in the absence of appearance stream and (as far as the text content is concerned) rich text string.

      PDF Clown supports (many of) these attributes, in particular if you cast the cloned annotation to StaticNote, e.g. the opacity CA using get/set/withAlpha, the border Border / BS using get/set/withBorder, the background color C using get/set/withColor, ...

      It by the way has an error in its line ending style LE support: Apparently the code for the Line annotation LE property was copied without checking; unfortunately that attribute there follows a different syntax...

    Your tasks

    Concerning the attributes you stated you want to change, therefore,

    • Rotation: There is no rotation attribute in the Callout annotation per se (other than the flag whether or not to follow the page rotation). Thus, you cannot set a rotation as a simple annotation attribute. If the source annotation does have an appearance stream, though, you can manipulate its Matrix to rotate it inside the annotation rectangle, see above.

    • Border color and font: If your Callout has an appearance stream, you can try and parse its content using a ContentScanner and manipulate color and font setting operations. Otherwise, if rich text information is set, for the font you can try and parse the rich text using some XML parser and manipulate font style attributes. Otherwise, you can parse the default appearance DA string and manipulate its font and color setting instructions.

    Some example code

    I created a file with an example Callout annotation using Adobe Acrobat: Callout-Yellow.pdf. It contains an appearance stream, rich text, and simple attributes, so one can use this file for example manipulations at different levels.

    The I applied this code to it with different values for keepAppearanceStream and keepRichText (you didn't mention whether you used PDF Clown for Java or .Net; so I chose Java; a port to .Net should be trivial, though...):

    boolean keepAppearanceStream = ...;
    boolean keepRichText = ...;
    
    try (   InputStream sourceResource = GET_STREAM_FOR("Callout-Yellow.pdf");
            InputStream targetResource = GET_STREAM_FOR("test123.pdf");
            org.pdfclown.files.File sourceFile = new org.pdfclown.files.File(sourceResource);
            org.pdfclown.files.File targetFile = new org.pdfclown.files.File(targetResource); ) {
        Document sourceDoc = sourceFile.getDocument();
        Page sourcePage = sourceDoc.getPages().get(0);
        Annotation<?> sourceAnnotation = sourcePage.getAnnotations().get(0);
    
        Document targetDoc = targetFile.getDocument();
        Page targetPage = targetDoc.getPages().get(0);
    
        StaticNote targetAnnotation = (StaticNote) sourceAnnotation.clone(targetDoc);
    
        if (keepAppearanceStream) {
            // changing properties of an appearance
            // rotating the appearance in the appearance rectangle
            targetAnnotation.getAppearance().getNormal().get(null).setMatrix(AffineTransform.getRotateInstance(100, 10));
        } else {
            // removing the appearance to allow lower level properties changes
            targetAnnotation.setAppearance(null);
        }
    
        // changing text background color
        targetAnnotation.setColor(new DeviceRGBColor(0, 0, 1));
    
        if (keepRichText) {
            // changing rich text properties
            PdfString richText = (PdfString) targetAnnotation.getBaseDataObject().get(PdfName.RC);
            String richTextString = richText.getStringValue();
            // replacing the font family
            richTextString = richTextString.replaceAll("font-family:Helvetica", "font-family:Courier");
            richText = new PdfString(richTextString);
            targetAnnotation.getBaseDataObject().put(PdfName.RC, richText);
        } else {
            targetAnnotation.getBaseDataObject().remove(PdfName.RC);
            targetAnnotation.getBaseDataObject().remove(PdfName.DS);
        }
    
        // changing default appearance properties
        PdfString defaultAppearance = (PdfString) targetAnnotation.getBaseDataObject().get(PdfName.DA);
        String defaultAppearanceString = defaultAppearance.getStringValue();
        // replacing the font
        defaultAppearanceString = defaultAppearanceString.replaceFirst("Helv", "HeBo");
        // replacing the text and line color
        defaultAppearanceString = defaultAppearanceString.replaceFirst(". . . rg", ".5 g");
        defaultAppearance = new PdfString(defaultAppearanceString);
        targetAnnotation.getBaseDataObject().put(PdfName.DA, defaultAppearance);
    
        // changing the text value
        PdfString contents = (PdfString) targetAnnotation.getBaseDataObject().get(PdfName.Contents);
        String contentsString = contents.getStringValue();
        contentsString = contentsString.replaceFirst("text", "text line");
        contents = new PdfString(contentsString);
        targetAnnotation.getBaseDataObject().put(PdfName.Contents, contents);
    
        // change the line width and style
        targetAnnotation.setBorder(new Border(0, new LineDash(new double[] {3, 2})));
    
        targetPage.getAnnotations().add(targetAnnotation);
    
        targetFile.save(new File(RESULT_FOLDER, "test123-withCalloutCopy.pdf"),  SerializationModeEnum.Standard);
    }
    

    (CopyCallOut test testCopyCallout)

    Beware, the code only has proof-of-concept quality: For arbitrary PDFs you cannot simply expect a string replace of "font-family:Helvetica" by "font-family:Courier" or "Helv" by "HeBo" or ". . . rg" by ".5 g" to do the job: fonts can be given using different style attributes or names, and different coloring instructions may be used.

    Screenshots in Adobe

    • The original file:

      Original annotation

    • keepAppearanceStream = true:

      with appearance stream kept but rotated

    • keepAppearanceStream = false and keepRichText = true:

      with appearance stream dropped and rich text kept but manipulated

    • keepAppearanceStream = false and keepRichText = false:

      with appearance stream and rich text dropped and simple attributes manipulated