Search code examples
matlabpdf

Total number of pages in a PDF document


MATLAB provides the extractFileText function which allows us to read text from PDF files, among other file formats, and save the extracted text as a string.

We can pass an extra argument to this function in order to extract text from specific pages of the document.

For example, to extract the text from pages 3, 5 and 7 from the sample exampleSonnets.pdf file:

str = extractFileText("exampleSonnets.pdf", 'Pages', [3 5 7]);

This function, however, does not provide a way of finding out the total number of pages that the PDF document contains beforehand.

So if we happen to do something like:

str = extractFileText("exampleSonnets.pdf", 'Pages', [99 100]);

The following error is thrown:

Error using extractFileText (line 95)
No page 100 in file. Maximum page number: 47.

Warning us that we have requested a page number that exceeds the actual total number of pages in the document.

This is fine.

However, how can I know the total number of pages in a PDF document beforehand, without triggering the error, so that I can safely narrow my searches to the maximum page number?

Is there a function for this purpose?


Solution

  • I'm not aware of a way that ould let you do this. But you can use try/catch to handle the situation directly without knowing the number of pages beforehand.

    If you do need to know the number of pages beforehand you could just iterate through the pages until you hit an error that you do handle using try/catch (works for small pdfs) or implement e.g. a binary search in a similar way.