Search code examples
pythondocxhighlight

How to get highlighted text in a docx file using python


I want to check a .docx document and get the info that is bold or highlighted, i know how to check if the text is bold and i don't know the attribute to get if it's higlighted.

Here in the code i get and append to the answers list the one it's bolded on the docx document, but i wanted to append de answer when it's bold or highlighted.

This is the code:

import docx

answers = []
def addAnswer(filename):
    global answers
    doc = docx.Document(filename)
    for paragraph in doc.paragraphs:
        for run in paragraph.runs:
            if run.bold: #Here I want to add the highlighted condition too
                answers.append("ANSWER: " + run.text[0].upper())

This is an example of the text I want to get: Highlighted text

I tried to use highlighted or highlight attribute on the code, like I did with "run.bold", but i didn't find anything and didn't work.


Solution

  • I think you just need to check the font highlight colour. Either it's None or it's the colour of the highlight.

    run.font.highlight_color is not None
    

    Example

    import docx
    
    answers = []
    def addAnswer(filename):
        global answers
        doc = docx.Document(filename)
        for paragraph in doc.paragraphs:
            for run in paragraph.runs:
                if run.bold and run.font.highlight_color is not None:  #Here I want to add the highlighted condition too
                    answers.append("ANSWER: " + run.text[0].upper())