Search code examples
vectorgraphicspdfclown

Extracting vector graphics (lines and points) with pdfclown


I want to extract vector graphics (lines and points) out of a pdf with pdfclown. I have tried to wrap my head around the graphics sample but i cannot figure out how the object model works for this. Please can anyone explain the relationships?


Solution

  • You are right: till PDF Clown 0.1 series, high-level path modelling was not implemented (it would have been derived from ContentScanner.GraphicsWrapper).

    Next release (0.2 series, due next month) will support the high-level representation of all the graphics contents, including path objects (PathElement), through the new ContentModeller. Here is an example:

    import org.pdfclown.documents.contents.elements.ContentModeller;
    import org.pdfclown.documents.contents.elements.GraphicsElement;
    import org.pdfclown.documents.contents.elements.PathElement;
    import org.pdfclown.documents.contents.objects.Path;
    
    import java.awt.geom.GeneralPath;
    
    for(GraphicsElement<?> element : ContentModeller.model(page, Path.class))
    {
      PathElement pathElement = (PathElement)element;
      List<ContentMarker> markers = pathElement.getMarkers();
      pathElement.getBox();
      GeneralPath getPath = pathElement.getPath();
      pathElement.isFilled();
      pathElement.isStroked();
    }
    

    In the meantime, you can extract the low-level representation of the vector graphics iterating the content stream through ContentScanner as suggested in ContentScanningSample (available in the downloadable distribution), looking for path-related operations (BeginSubpath, DrawLine, DrawRectangle, DrawCurve, ...).