Search code examples
pdfpdf-generationacrobatpdf-reader

Is this possible to break the pdf file smaller than page wise breaking?


I found there is a lot of tools available for breaking the Big PDF files into smaller one by splitting the original PDF file PAGE WISE.for example, if i have a 10 page PDF Document,then we can able to break the original pdf file into 10 pieces in page wise splitting.

But i want similar kind of tool that breaks the PDF file smaller than the Page wise splitting.That means,i need to split the PDF page into different documents based on any parameter like paragraph,section,element...

for example,
If my PDF file having 2 pages with 10 paragraphs then i would like to split the pdf file into 10 separate Pdf file based on paragraph parameter...

Also, I strongly believe pdf does not contain any structure like Open XML.But i also Suspecting


How the tools can able to break the pdf files in to small pdf files by splitting page wise?
What kind of mechanism they are using for page wise splitting PDF File?

So, Is there any way to do my work? Please give me your valuable suggestion on this?


Solution

  • PDF is a vector based document description language. It's page based so in a way every page is independent from the next one. Splitting page wise is therefore pretty easy. Contrary to a raster image where you can extract small subsets independently in a pdf you have to render the whole page to know how a small subset looks like.

    Say you have a Page (black) which contains a complex shaped object (here it is a line but it could be any text, shape, image, etc.) and you want to extract a subset (red). You would have to first find all the objects that produce visible output in the region of interest. Then you would have to modify them so they are rendered correctly (in this case calculate the green points from the blue points while preserving the shape of the object).

    Complex shape on a page

    An easier approach would be to include the whole page and clip the viewing area to the dimensions of the region.

    You could do this with pdfjam. Check the --trim/--offset/--delta command in conjunction with a custom paper size (Example 6,7 on the pdfjam website). You would still have to somehow calculate the coordinates of the region of interest though.