Search code examples
pythonpython-3.xxmlpython-docx

Identifying a cover page in word document .docx file using python-docx


I am interested in knowing if it is at all possible to identify any elements in the xml of a .docx file that could be used to distinguish the cover page of a word document.

The only identifiable element I have found thus far to denote a cover page is the page break.

<w:br type="page"> 

However this is the same page break that can be found throughout the document.

The purpose of this question is that I hope to number the pages of the a word document from the second page if a cover page is present in the word file, otherwise start at page 1.


Solution

    • Maybe there is a paragraph style that appears only on a cover page (perhaps "Title")
    • Maybe the cover page is its own section and has a header or footer that is distinctive
    • Maybe there is a distinctive header or footer that appears as "first-page-only" (one of the three header/footer types).
    • Maybe there is a common string or string pattern in the text.

    These are some ideas, but the short story is you would need to detect such a thing based on the content. There is no concept of a "cover-page" in Word that is different from other pages.