Search code examples
xmlpython-docx

Edit content in header of document Python-docx


I am trying to find and replace text in a Textbox in Header of document. But after searching for awhile, it seems there is no way to access the content in the Header or "float" text boxes via python-docx (I read issue here)

So, it means we have to find and replace directly on the xml format of document. Do you know anyway to do that?


Solution

  • I found a way to solve this problem. For instance, I have a template.docx file, and I want to change text in a Textbox in Header as described above. Flow steps below, I solved my problems:

    1. Rename file template.docx to template.zip
    2. Unzip template.zip to template folder
    3. Find and replace text I want to change in one of the header<number>.xml files in /template/word/ folder.
    4. Zip all files in /template folder back to template.zip
    5. Rename template.zip back to template.docx

    I used Python to manipulate these

    import os
    import shutil
    import zipfile
    
    WORKING_DIR = os.getcwd()
    TEMP_DOCX = os.path.join(WORKING_DIR, "template.docx")
    TEMP_ZIP = os.path.join(WORKING_DIR, "template.zip")
    TEMP_FOLDER = os.path.join(WORKING_DIR, "template")
    
    # remove old zip file or folder template
    if os.path.exists(TEMP_ZIP):
        os.remove(TEMP_ZIP)
    if os.path.exists(TEMP_FOLDER):
        shutil.rmtree(TEMP_FOLDER)
    
    # reformat template.docx's extension
    os.rename(TEMP_DOCX, TEMP_ZIP)
    
    # unzip file zip to specific folder
    with zipfile.ZipFile(TEMP_ZIP, 'r') as z:
        z.extractall(TEMP_FOLDER)
    
    # change header xml file
    header_xml = os.path.join(TEMP_FOLDER, "word", "header1.xml")
    xmlstring = open(header_xml, 'r', encoding='utf-8').read()
    xmlstring = xmlstring.replace("#TXTB1", "Hello World!")
    with open(header_xml, "wb") as f:
        f.write(xmlstring.encode("UTF-8"))
    
    # zip temp folder to zip file
    os.remove(TEMP_ZIP)
    shutil.make_archive(TEMP_ZIP.replace(".zip", ""), 'zip', TEMP_FOLDER)
    
    # rename zip file to docx
    os.rename(TEMP_ZIP, TEMP_DOCX)
    shutil.rmtree(TEMP_FOLDER)