Search code examples
python-3.xazure-databrickspython-pptx

python-pptx Adding Custom Properties to the pptx file


I am trying to add custom properties to the pptx file using python-pptx library. I was able to add but unable to save the file.

Using Python 3.10.12 Databricks Runtime: 13.3 LTS

from pptx import Presentation
from pptx.util import Inches, Pt
from pptx.opc.constants import RELATIONSHIP_TYPE AS RT
from pptx.oxml.ns import nsdecls
from pptx.oxml import parse_xml

pptx = Presentation()
first_slide_layout = pptx.slide_layouts[0]

slide = pptx.slides.add_slide(first_slide_layout)

slide.shapes.title.text = "Created by Python-pptx"

pptx.save("databricks/driver/test.pptx")
####-----> until here code works.

## Below part of adding custom properties does not work.

def add_custom_properties(ppt, name, value):
        custom_property_xml = f'<properties xmlns = "http://schemas.openxmlformats.org/officeDocument/2006/extended-properties">'\
                     f'<property name="{name}" type="string">{value}</property>'\
                      '</properties>'
        custom_property = parse_xml(custom_property_xml)

        ppt.part.package.relate_to(custom_property, 'http://schemas.openxmlformats.org/officeDocument/2006/relationships/custom-properties')

# Below command does not return any error.
add_custom_properties(pptx, "test_key", "test_key_value")

# while saving throws an error.
pptx.save("/databricks/driver/aaa.pptx")

Error: file can either be a file-path or a file-like object open for writing bytes.

Can someone help me understand what is wrong with this code ?


Solution

  • All the simple syntax errors and wrong file path aside, this code looks as if any AI has guessed it from some other code snippets. No, AI is not able to program up to now.

    Package.relate_to needs a package part to relate to and not a lxml.etree as parse_xml returns.

    And the relation is not the whole thing. At first a package part needs to be created. Python-pptx does not support custom properties part up to now. So there is no such package part in default pptx-template. Therefore it must be created using PartFactory.

    Further the XML of the Properties element is wrong too. See Set a custom property in a word processing document for an example and descriptions. This is for a word processing document but is the same for presentation slide shows too.

    The linked learn-Microsoft.com-page describes:

    • Each property in the XML content consists of an XML element that includes the name and the value of the property.
    • For each property, the XML content includes an fmtid attribute, which is always set to the same string value: {D5CDD505-2E9C-101B-9397-08002B2CF9AE}.
    • Each property in the XML content includes a pid attribute, which must include an integer starting at 2 for the first property and incrementing for each successive property.
    • Each property tracks its type (in the figure, the vt:lpwstr and vt:filetime element names define the types for each property).

    As this comes from Microsoft itself, you will not find any more reliable description. But for the property types I can tell you from own expires that there are lpwstr, filetime and bool, as well as i4 for integer numbers and r8 for floating point numbers.

    Following code works for me and creates a aaa.pptx having three custom properties set.

    from pptx import Presentation
    
    presentation = Presentation()
    first_slide_layout = presentation.slide_layouts[0]
    
    slide = presentation.slides.add_slide(first_slide_layout)
    
    slide.shapes.title.text = 'Created by Python-pptx'
    
    presentation.save('./test.pptx')
    ####-----> until here code works.
    
    ## Below part of adding custom properties works too now.
    
    from pptx.opc.constants import (
        RELATIONSHIP_TYPE,
        CONTENT_TYPE,
    )
    from pptx.opc.packuri import PackURI
    from pptx.opc.package import PartFactory
    
    def add_custom_properties(presentation, name, value, value_type):
        contains_custom_property = False
        for part in presentation.part.package.iter_parts():
            if part.partname == '/docProps/custom.xml':
                custom_property_part = part
                contains_custom_property = True
                
        pid = 2    
        if not contains_custom_property:
            custom_property_xml = '<Properties xmlns="http://schemas.openxmlformats.org/officeDocument/2006/custom-properties" xmlns:vt="http://schemas.openxmlformats.org/officeDocument/2006/docPropsVTypes">'
            custom_property_xml += f'<property fmtid="{{D5CDD505-2E9C-101B-9397-08002B2CF9AE}}" pid="{pid}" name="{name}">'
            custom_property_xml += f'<vt:{value_type}>{value}</vt:{value_type}>'
            custom_property_xml += '</property>'
            custom_property_xml += '</Properties>'
            
            custom_property_pack_uri = PackURI('/docProps/custom.xml')
            custom_property_content_type = CONTENT_TYPE.OFC_CUSTOM_PROPERTIES
            custom_property_part = PartFactory(custom_property_pack_uri, custom_property_content_type, presentation.part.package, custom_property_xml.encode('utf-8'))
    
            presentation.part.package.relate_to(custom_property_part, RELATIONSHIP_TYPE.CUSTOM_PROPERTIES)
        else:
            custom_property_xml = custom_property_part.blob.decode('utf-8')
            custom_property_xml = custom_property_xml[:-13]
            
            pid = 2 + custom_property_xml.count('<property')
            
            custom_property_xml += f'<property fmtid="{{D5CDD505-2E9C-101B-9397-08002B2CF9AE}}" pid="{pid}" name="{name}">'
            custom_property_xml += f'<vt:{value_type}>{value}</vt:{value_type}>'
            custom_property_xml += '</property>'    
            custom_property_xml += '</Properties>'
            custom_property_part.blob = custom_property_xml.encode('utf-8')
    
    add_custom_properties(presentation, "test_key", "test_key_value", "lpwstr")
    add_custom_properties(presentation, "number", "1234", "i4")
    add_custom_properties(presentation, "date", "2024-08-06", "filetime")
    
    # while saving not throws any error.
    presentation.save('./aaa.pptx')