Search code examples
pythonxmllxmlpretty-print

Change tab spacing in python lxml prettyprint


I have a small script that creates an xml document and using the prettyprint=true it makes a correctly formatted xml document. However, the tab indents are 2 spaces, and I am wondering if there is a way to change this to 4 spaces (I think it looks better with 4 spaces). Is there a simple way to implement this?

Code snippet:

doc = lxml.etree.SubElement(root, 'dependencies')
for depen in dependency_list:
    dependency = lxml.etree.SubElement(doc, 'dependency')
    lxml.etree.SubElement(dependency, 'groupId').text = depen.group_id
    lxml.etree.SubElement(dependency, 'artifactId').text = depen.artifact_id
    lxml.etree.SubElement(dependency, 'version').text = depen.version
    if depen.scope == 'provided' or depen.scope == 'test':
        lxml.etree.SubElement(dependency, 'scope').text = depen.scope
    exclusions = lxml.etree.SubElement(dependency, 'exclusions')
    exclusion = lxml.etree.SubElement(exclusions, 'exclusion')
    lxml.etree.SubElement(exclusion, 'groupId').text = '*'
    lxml.etree.SubElement(exclusion, 'artifactId').text = '*'
tree.write('explicit-pom.xml' , pretty_print=True)

Solution

  • This doesn't seem to be possible by the python lxml API.

    A possible solution for tab spacing would be:

    def prettyPrint(someRootNode):
        lines = lxml.etree.tostring(someRootNode, encoding="utf-8", pretty_print=True).decode("utf-8").split("\n")
        for i in range(len(lines)):
            line = lines[i]
            outLine = ""
            for j in range(0, len(line), 2):
                if line[j:j + 2] == "  ":
                    outLine += "\t"
                else:
                    outLine += line[j:]
                    break
            lines[i] = outLine
        return "\n".join(lines)
    

    Please note that this is not very efficient. High efficiency can only be achieved if this functionality is natively implemented within the lxml C code.