Search code examples
python-3.xwin32com

Python Win32Com First page Header/Footer


Some Context

I am trying to scrape information from a large number of .doc files, so I've written a python program to do the heavy lifting for me. Word has this nifty ability to make the header and footer of the first page different. This is generally useful, but I am running into a problem which I'm not finding a good solution for.

This is how I am accessing headers and footers:

import win32com
word_app = win32com.client.Distpatch('Word.Application')
doc = word_app.Documents.Open('path/to/my/word/file.docx')
first_footer = doc.Sections(1).Footers(1).Range.Text
print(first_footer)

There is a catch, though: the first page contains header/footers which are common throughout the document, but also some things which are unique to the first page. The code above does not capture this unique information: it only shows the header/footer information from the first page which is common throughout the document.

When the first page has a unique content in its header and footer, how do I access it using python's win32com?


Solution

  • After some digging, I have found an answer.

    It turns out you need to use a constant called "wdHeaderFooterFirstPage" within the constants bit of the module to access the first page header and footer, like so:

    doc.Sections(1).Headers(win32com.client.constants.wdHeaderFooterFirstPage).Range.Text
    

    This returns a string which you can manipulate like normal. Documentation for win32com is hard to find, and translating it from VBA documentation is not as obvious as I would like it to be.