I'm searching word documents to get descriptions of things that are written in the docs. However, these docs are not all formatted the same. But one thing that is consistent is the text block I want is always after the title 'Description'. So I'd search for 'Description' then hope to get the text of the next paragraph object after it. How an I increment the paragraph object (so to speak)?
for subdir, dirs, files in os.walk(rootdir):
for file in files:
doc = docx.Document(os.path.join(rootdir, file))
for paragraph in doc.paragraphs:
if 'Description' in paragraph.text:
print(paragraph[i+1].text) #I know you can't do i+1 but
#that's essentially what I want to do
A simple approach would be:
paragraphs = list(doc.paragraphs)
for i in range(len(paragraphs)):
paragraph = paragraphs[i]
if 'Description' in paragraph.text:
print(paragraphs[i+1].text)
If you know for sure that the description label appears in a paragraph with Heading 1
style, you could further qualify heading paragraphs so you don't get false positives on a paragraph that just happens to use that word.