Search code examples
pythonpython-docx

How to output the correct statement in a loop using python


I am currently working on some python (version 3.10.4) code on PyCharm (Community Edition 2021.3.3) using the python-docx library (version 0.8.1.1), that allows to determine if text (in a Word document) formatted in the 'Normal' style contains a specific font (Times New Roman or Times New Roman and Cambria Math). When I execute the code the statements printed are not those desired.

What I mean here is that if all text (in the 'Normal' style) are in Times New Roman it should print, "Body text is in Times New Roman", while if text contains both Times New Roman and Cambria Math it should print, "Body text is Times New Roman and Cambria Math" and if text is neither all Times New Roman nor a combo of Times New Roman and Cambria Math, it should print, "Unrecognised body text font".

When I execute the code (as shown below) it prints a combination of 'Body text is in Times New Roman' and 'Unrecognised body text font' (both printed the amount of times such occurrences are present in the document). The Word document contains the following fonts: Times New Roman, Cambria Math and Arial (used only for testing purposes). So it should print "Unrecognised body text font" (as all text are not Times New Roman nor a combo of Times New Roman and Cambria Math).

import docx  # import the python-docx library
WordFile = docx.Document("my file directory")  # Word document file directory for python-docx to access

for paragraph in WordFile.paragraphs:
    name = []
    if 'Normal' == paragraph.style.name:
        for run in paragraph.runs:
            name.append(run.font.name)
            for i in name:
                if i == 'Times New Roman':   # checks if the elements in name = [] are 'Times New   Roman'
                    print("Body text is in Times New Roman")  # print this statement if all  elements in name = [] are 'Times New Roman'
                elif i == 'Times New Roman' and i == 'Cambria Math':  # checks if the elements in name = [] are 'Times New Roman' and 'Cambria Math'
                    print("Body text is Times New Roman and Cambria Math")  # print this if fonts in name = [] are both 'Times New Roman' and 'Cambria Math'
                else:
                    print("Unrecognised body text font")  # print this if fonts in name = [] are neither all 'Times New Roman' nor a combo of 'Times New
                    # Roman' and 'Cambria Math'

I believe the problem lies in the loops in which it check if all elements in the empty list name = [] are of a particular font. The print statements should only execute if all elements in the list satisfy the given conditions and only one of the statements should be printed not a combination as currently produced. But I cannot seem to be able to solve this issue. Any form of help would be appreciated. A picture of the currently produced outputs is attached.


Solution

  • I think you're looking for this kind of functionality:

    # return True if both Times New Roman and Cambria Math appear in your final list
    all(i in name for i in ['Times New Roman', 'Cambria Math'])
    

    or maybe:

    # return True if *only* Times New Roman and Cambria Math appear in the final list
    all(i in ['Times New Roman', 'Cambria Math'] for i in name)
    

    Without understanding the rest of the logic, there appears to be other issues in the code:

    • This font check should probably be deindented, so it will run only after collecting information on each paragraph (or, document?)
    • It seems we are needlessly appending a font name to the list, causing duplication and slowing down your final list iteration. Is an analysis of this list ultimately needed? If not, consider a check or another data type like set to avoid duplication.

    For now, just a simple revision of your logic to achieve the desired behavior. I implemented one of the variants of the list checks, but you can choose the one that fits your use case.

    import docx  # import the python-docx library
    
    # Word document file directory for python-docx to access
    WordFile = docx.Document("test1.docx")
    
    font_names = set()
    for paragraph in WordFile.paragraphs:
        if "Normal" == paragraph.style.name:
            for run in paragraph.runs:
                if run.font.name is not None and run.font.name not in font_names:
                    font_names.add(run.font.name)
    
    if {"Times New Roman", "Cambria Math"} == font_names:
        # print this if fonts in name = [] are both 'Times New Roman' and 'Cambria Math'
        print("Body text is Times New Roman and Cambria Math")
    elif {"Times New Roman"} == font_names:
        # checks if the elements in name = [] are 'Times New Roman'
        # print this statement if all  elements in name = [] are 'Times New Roman'
        print("Body text has Times New Roman")
    else:
        print("Unrecognised body text font")