Search code examples
python-docx

where is the word/_rels/document.xml.rels in python docx object?


I need the content of word/_rels/document.xml.rels to get the image infomation. Does python-docx store it?

I use this:

>>> from docx import Document as d
>>> x=d('a.docx')

there seems no way to get it in x object.


Solution

  • python-docx and python-pptx share a common opc subpackage; this is the docx.opc subpackage.

    This layer abstracts the details of the .rels files, among other things.

    You can get to it using:

    >>> document = Document()
    >>> document_part = document.part
    >>> rels = document_part.rels
    >>> for r in rels:
    ...   print r.rId
    'rId2'
    'rId1'
    'rId3'
    

    How you use it most effectively depends on what you're trying to get at. Usually one just wants to get a related part and doesn't care about navigating the details of the packaging. For that there are these higher level methods:

    • docx.opc.part.Part.part_related_by()
    • docx.opc.part.Part.related_parts[rId]

    In general the route from the object at hand is:

    1. to the part it's contained in (often available on obj.part)
    2. to the related part by use of .part_related_by() (using relationship type) or .related_parts[rId] (it's a dict).
    3. back down the the API object via X_Part.main_obj e.g. DocumentPart.document

    The areas in the code you might be interested in looking closer at are:

    • docx/parts/
    • docx/opc/part.py