Search code examples
pythonpython-docx

How to increment paragraph object in word document using python-docx?


I'm searching word documents to get descriptions of things that are written in the docs. However, these docs are not all formatted the same. But one thing that is consistent is the text block I want is always after the title 'Description'. So I'd search for 'Description' then hope to get the text of the next paragraph object after it. How an I increment the paragraph object (so to speak)?

for subdir, dirs, files in os.walk(rootdir):
    for file in files:
        doc = docx.Document(os.path.join(rootdir, file))
        for paragraph in doc.paragraphs:
            if 'Description' in paragraph.text:
                print(paragraph[i+1].text) #I know you can't do i+1 but
                                           #that's essentially what I want to do

Solution

  • A simple approach would be:

    paragraphs = list(doc.paragraphs)
    
    for i in range(len(paragraphs)):
        paragraph = paragraphs[i]
        if 'Description' in paragraph.text:
            print(paragraphs[i+1].text)
    

    If you know for sure that the description label appears in a paragraph with Heading 1 style, you could further qualify heading paragraphs so you don't get false positives on a paragraph that just happens to use that word.