Search code examples
javapdfitextpdf-parsingpdfrenderer

Get the Extreme right , left,top,bottom position of an image - Itext


I am setting a margin for a pdf and checking if the contents of the page are exceeding the margin.

I am easily able to do that if the contents of a page are just text.

Here s what I am doing:

I am using TextMarginFinder. I will set the left margin values of the pdf based on the book size. and check with the finder.getLlx(); since finder.getLlx(); will get me the left most position of a text in that page.

TextMarginFinder finder;
if(leftmar>=finder.getLlx())
   {
        errormargin=1; //left margin error
        System.out.println("Page: "+i+"Margin Error:LeftMArginError ");
   }

But this does not work in case if the page contains an image. Although the image goes outside of the margin, I am not getting the error with the above code since the finder.getLlx(); function seems to work only for texts.

Two Questions:

1) While looping through the pages in pdf, if there is an image in that page, how can I check if that particular page contains an image?

2) If it contains an image, how can I obtain its extreme positions?

Update after mkl suggestion

     if(leftmar>=finder.getLlx())
{
    errormargin=1; //left margin error
    System.out.println("finder.getLlx() value ="+finder.getLlx()+", leftmar Value="+leftmar);

}



     if(rightmar<= finder.getUrx()){
            errormargin=1; //right margin error
            System.out.println("finder.getUrx() value ="+finder.getUrx()+", rightmar Value="+rightmar);
     }


if(margintop >= finder.getUry()){
    errormargin=3; //top margin error
    System.out.println("finder.getUry() value ="+finder.getUry()+", margintop Value="+margintop);
}


if(marginbottom >= finder.getLly()){
    errormargin=3; //bottom margin error
    System.out.println("finder.getLly() value ="+finder.getLly()+", marginbottom Value="+marginbottom);
}

Solution

  • This is more an answer to what the OP actually wanted, a way to retrieve the bounding box of all content on a page.

    The OP already uses the iText TextMarginFinder render listener class to determine the bounding box of the text on page. In the context of this answer an analogous class MarginFinder has been developed which does not only consider text but also other kind of content, e.g. bitmap images and vector graphics.

    Thus, replacing the use of TextMarginFinder by MarginFinder allows to find the bounding box of any content on the page.

    Please be aware:

    • Any content is considered, the margin finder does not check whether the content makes a difference. E.g. think about white text, white bitmap areas, or white rectangles, all are considered content and, therefore, the bounding box encompasses such invisible content, too. Especially the latter example, white rectangles, might be a problem here or there as some software first paints a white rectangle over the whole page area.

    • Clipping paths are not considered. Thus, even content that never is drawn (because it is clipped away) makes the bounding box expand.

    • Page borders are not considered, either. Thus, off-page content like printer marks may make the bounding box expand even more.

    • The code calculating the bounding box for vector graphics is not correct: it simply returns the bounding box of all control points which in case of Bezier curves may be false. Its ignoring line widths and wedge types also results in somewhat-off coordinates.

    • Annotations are not considered. Thus, the resulting bounding box may be to small if annotations are expected to also be considered, e.g. for forms.

    In spite of these shortcomings, the render listener usually returns correct results. If this is not enough, the class can be extended accordingly.

    PS: Anyone who is interested in the original question may find answers in the MarginFinder render listener class and its use.