Search code examples
pythonxmlsvglxmlinkscape

Setting element content with lxml removes trailing whitespaces


I am currently trying to create an svg image using the lxml library. However, I create some tspace elements to structure and format the text within text elements, but if I try to set the content of such a tspace element to something like "Hello World ", the trailing whitespace is removed and I get <tspace>Hello World</tspace> as result. But I would like to keep this whitespace.

At this point it is important to mention, that I just have access to the Tree object, but I do not initialize the parser. So if I have to change some parser flags, I could not access the parser directly. Following a small example of my code:

#!/usr/bin/env python
import sys, os

class HelloPlugin(inkex.Effect):
    def __init__(self):
        # Call the base class constructor.
        inkex.Effect.__init__(self)

    def effect(self):
        # Fetch the svg root element (lxml etree element) ...
        svg = self.document.getroot()

        # ... as well as the image width and height.
        width  = inkex.unittouu(svg.get('width'))
        height = inkex.unittouu(svg.get('height'))
        fontSize = 12

        # Create a new layer.
        layer = inkex.etree.SubElement(svg, 'g')
        layer.set(inkex.addNS('label', 'inkscape'), 'Headline Layer')
        layer.set(inkex.addNS('groupmode', 'inkscape'), 'layer')

        # Create the text element, ...
        text = inkex.etree.SubElement(layer, inkex.addNS('text','svg'))
        text.set('x', str(width / 2 + fontSize))
        text.set('y', str(height / 2 + fontSize / 2))

        # ... define text style and position ...
        style = {
            'font-size': str(fontSize)
        }

        # ... and set the text style.
        text.set('style', formatStyle(style))

        # Finally create the tspan element.
        tspan = inkex.etree.SubElement(text, inkex.addNS('tspan','svg'))
        tspan.text = "Hello Plugin "

def main(argv):
    cubify = Cubify()
    cubify.affect()

if __name__ == "__main__":
    main(sys.argv[1:])

So my question is, how I have to change the above example to get <tspace>Hello Plugin </tspace> instead of <tspace>Hello plugin</tspace> in the resulting svg file.


Solution

  • You are probably looking for the tail property of XML elements in the lxml.tree library. Here's what the docs say about it:

    However, if XML is used for tagged text documents such as (X)HTML, text can also appear between different elements, right in the middle of the tree:

    <html><body>Hello<br/>World</body></html>

    Here, the <br/> tag is surrounded by text. This is often referred to as document-style or mixed-content XML. Elements support this through their tail property. It contains the text that directly follows the element, up to the next element in the XML tree:

    In your case, the contents of tail is the whitespace. Here's an example:

    import lxml.etree as ET
    x = ET.fromstring("<foo> <bar>blah</bar>    </foo>")
    bar = x.find("bar")
    print( repr(bar.tail) )
    

    Prints:

    '    '