I am using Python 3.12 and lxml.
I want to find a particular tag, and I can do it with elem.find("tag"). elem is of type Element.
But I want to move child elements of this child into the parent where the child was. For that, I need the index of the child. ANd I can't find a way to find that index.
lxml's API description has the _Element.index() method, but I have no idea how to get an _Element instance from an Element instance.
Please advise how to determine that index. (Using a loop instead of find() can do that but I'd like a neater way).
EDIT: here is a sample XML element
<parent>
<child-a/>
<container>
<child-b/>
<child-c/>
</container>
<child-d/>
<child-e/>
</parent>
I am writing code that finds , which is a child of but I don't know its position in advance (there can be several of them too), and moves its children into the parent where it was, then deletes , to get this:
<parent>
<child-a/>
<child-b/>
<child-c/>
<child-d/>
<child-e/>
</parent>
So, I can find <container>
using parent.find()
. But to move its children into the same place under <parent>
I need to have the index of <container>
, as the insert()
method requires an index. For now I use this kludge:
while True:
index = None
found = None
for i in range(len(parent)):
if parent[i].tag =="container":
found = parent[i]
index = i
break
if found is None:
break
offset = 0
while len(found) > 0:
parent.insert(index+offset,found[0])
offset+=1
parent.remove(found)
I do know that offset
is redundant as one could just increase index
, I did that for aesthetic reasons. But the loop itself is quite the kludge. Here is what I would do if Element
had an index()
method, but it doesn't:
found = parent.find("container")
while found:
index = parent.index(found)
offset = 0
while len(found) > 0:
parent.insert(index+offset,found[0])
offset+=1
parent.remove(found)
found = parent.find("container")
But Element.index()
does not exist; _Element.index()
exists but I don't know how to access _Element
.
You can use container.getchildren()
or list(container)
to get all children and use addnext()
to put them (one by one) after container, and later you can remove (aready) empty container. It needs reversed()
to put children in correct order.
parent = container.getparent()
#for child in reversed(container.getchildren()):
for child in reversed(container):
container.addnext(child)
parent.remove(container)
Full working example which I was using for tests (with some extra comments):
child.tail = container.tail
to clean some indentations.ET.indent(tree)
to clean all indentations.reversed(container)
instead of reversed(container.getchildren())
container.iterchildren(reversed=True)
instead of reversed(container)
html = '''
<parent>
<child-a/>
<container><child-b/><child-c/></container>
<child-d/>
<container><child-e/><child-f/></container>
<child-g/>
</parent>
'''
import lxml.html
tree = lxml.html.fromstring(html)
for container in tree.findall('container'):
parent = container.getparent()
#for child in reversed(container.getchildren()): # getchildren() - deprecated
#for child in reversed(container):
for child in container.iterchildren(reversed=True):
#child.tail = None # clean indentations # elements in one line
#child.tail = "\n" # clean indentations # next tag starts in first column
#child.tail = container.tail # clean indentations
container.addnext(child)
parent.remove(container)
# https://lxml.de/apidoc/lxml.etree.html#lxml.etree.indent
import lxml.etree as ET
#ET.indent(tree, space=' ') # clean all indentations - use 4 spaces
#ET.indent(tree, space='....') # clean all indentations - use 4 dots - looks like TOC (Table Of Contents) in book :)
ET.indent(tree) # clean all indentations - use (default) 2 spaces
#html = lxml.html.tostring(tree, pretty_print=True).decode()
#html = lxml.html.tostring(tree).decode()
html = lxml.html.tostring(tree, encoding='unicode') # not `utf-8` but `unicode` ???
print(html)
Result without child.tail = container.tail
and without ET.indent(tree)
:
<parent>
<child-a></child-a>
<child-b></child-b><child-c></child-c><child-d></child-d>
<child-e></child-e><child-f></child-f><child-g></child-g>
</parent>
Result with child.tail = container.tail
or with ET.indent(tree)
:
<parent>
<child-a></child-a>
<child-b></child-b>
<child-c></child-c>
<child-d></child-d>
<child-e></child-e>
<child-f></child-f>
<child-g></child-g>
</parent>
Doc: addnext(), addprevious(), lxml.etree.indent(), getparen(), getchildren(), iterchildren(reversed=True)