Search code examples
ms-wordduplicatespython-3.6python-docx

How to avoid duplicates in python-docx?


The program creates a heading (the current date) in a document and I want to avoid possible duplicates of heading if this heading is already in a document. My code creates a heading but also duplicates. What I should to change in my code that the program avoids duplicates?

date = datetime.today().strftime('%A, %d. %B %Y')
document = Document('example.docx')
def duplicate(document):
    for paragraph in document.paragraphs:
        if date not in paragraph.text:
           document.add_heading(date)
           document.save('example.docx')
duplicate(document)

Solution

  • Many problems in this question:

    1. Should be: datetime.date.today().strftime('%A, %d. %B %Y')
    2. Your code looks for date in each paragraph and if it's not present in that paragraph, it adds a heading with that date. That means even if you have a paragraph that has the date, you're still going to create headings for the ones that don't, because if date not in paragraph.text: still runs and a heading will be added
    3. document.save('example.docx') only needs to run after you're done changing it. You don't need to save it every time. That for paragraph in document.paragraphs: keeps saving the final result for no apparent reason.

    If you want to add a heading with that date only when it isn't present in the whole document you can do something like this (There are many other ways of doing it but this seems cleaner to me):

    if document.element.xml.find(date) == -1:
        document.add_heading(date)
    document.save('example.docx')