Search code examples
pythondocxpython-docx

How to get to know paragraph color docx python?


I have document where some lines are highlighted. I have managed to open it and get text line by line:

doc = Document("/Users/an/PycharmProjects/projects/test.docx")

for para in doc.paragraphs:
   print(para.text)

but I don't know how to check which color has each line. Maybe this library has some method for such task?


Solution

  • A run can have a .highlight_color, which is perhaps what you're looking for:

    for p in doc.paragraphs:
        for r in p.runs:
            print(r.font.highlight_color)
    

    There is no concept of "line" in Word. There are paragraphs, within which there are runs. A run is a sequence of characters that shares the same character formatting (aka. "font").

    Note that breaks between runs occur arbitrarily in text and there's nothing stopping a "line" of highlighted text being broken into several runs. So you may need to check for adjacent runs with the same .highlight_color value to "assemble" those into the same sequence of text in the "apparent" highlighted passage.