Search code examples
python-docx

How to add w:altChunk and its relationship with python-docx


I have a use case that make use of <w:altChunk/> element in Word document by inject (fragment of) HTML file as alternate chunks and let Word do it works when the file gets opened. The current implementation was using XML/XSL to compose WordML XML, modify relationships, and do all packaging stuffs manually which is a real pain.

I wanted to move to python-docx but the API doesn't support this directly. Currently I found a way to add the <w:altChunk/> in the document XML. But still struggle to find a way to add relationship and related file to the package.

I think I should make a compatible part and pass it to document.part.relate_to function to do its job. But still can't figure how to:

from docx import Document
from docx.oxml import OxmlElement, qn
from docx.opc.constants import RELATIONSHIP_TYPE as RT

def add_alt_chunk(doc: Document, chunk_part):
    ''' TODO: figuring how to add files and relationships'''
    r_id = doc.part.relate_to(chunk_part, RT.A_F_CHUNK)
    alt = OxmlElement('w:altChunk')
    alt.set(qn('r:id'), r_id)
    doc.element.body.sectPr.addprevious(alt)

Update:

As per scanny's advice, below is my working code. Thank you very much Steve!

from docx import Document
from docx.oxml import OxmlElement
from docx.oxml.ns import qn
from docx.opc.part import Part
from docx.opc.constants import RELATIONSHIP_TYPE as RT


def add_alt_chunk(doc: Document, html: str):
    package = doc.part.package
    partname = package.next_partname('/word/altChunk%d.html')
    alt_part = Part(partname, 'text/html', html.encode(), package)
    r_id = doc.part.relate_to(alt_part, RT.A_F_CHUNK)
    alt_chunk = OxmlElement('w:altChunk')
    alt_chunk.set(qn('r:id'), r_id)
    doc.element.body.sectPr.addprevious(alt_chunk)


doc = Document()
doc.add_paragraph('Hello')
add_alt_chunk(doc, "<body><strong>I'm an altChunk</strong></body>")
doc.add_paragraph('Have a nice day!')
doc.save('test.docx')

Note: the altChunk parts only work/appear when document is open using MS Word


Solution

  • Well, some hints here anyway. Maybe you can post your working code at the end as a full "answer":

    1. The alt-chunk part needs to start its life as a docx.opc.part.Part object.

      The blob argument should be the bytes of the file, which is often but not always plain text. It must be bytes though, not unicode (characters), so any encoding has to happen before calling Part().

      I expect you can work out the other arguments:

      • package is the overall OPC package, available on document.part.package.
      • You can use docx.opc.package.OpcPackage.next_partname() to get an available partname based on a root template like: "altChunk%s" for a name like "altChunk3". Check what partname prefix Word uses for these, possibly with unzip -l has-an-alt-chunk.docx; should be easy to spot.
      • The content-type is one in docx.opc.constants.CONTENT_TYPE. Check the [Content_Types].xml part in a .docx file that has an altChunk to see what they use.
    2. Once formed, the document_part.relate_to() method will create the proper relationship. If there is more than one relationship (not common) then you need to create each one separately. There would only be one relationship from a particular part, just some parts are related to more than one other part. Check the relationships in an existing .docx to see, but pretty good guess it's only the one in this case.

    So your code would look something like:

    package = document.part.package
    partname = package.next_partname("altChunkySomethingPrefix")
    content_type = docx.opc.constants.CONTENT_TYPE.THE_RIGHT_MIME_TYPE
    blob = make_the_altChunk_file_bytes()
    
    alt_chunk_part = Part(partname, content_type, blob, package)
    
    rId = document.part.relate_to(alt_chunk_part, RT.A_F_CHUNK)
    etc.