Search code examples
pythonpython-2.7lxmlnodesremovechild

Delete a node lxml python


I have found a lot of examples to remove element node in an XML file. BUt here is an example for which I didnt find any solution either on stackoverflow or google. For example:

<slide>
    America
    <a> 2 </a>
    <b> 3 </b>
    <c> 4 </c>
</slide>

<slide>
    Germany
    <a> 5 </a>
    <b> 6 </b>
    <c> 7 </c>
</slide>

I would use remove function to delete an element node, since I am using lxml. But now I have to delete "America" and "Germany" which are not actually element nodes but text.

is there a way to remove this like any function??

I am currently using python lxml library.

Output should look like :

 <slide>
     <a> 2 </a>
     <b> 3 </b>
     <c> 4 </c>
 </slide>

 <slide>
     <a> 5 </a>
     <b> 6 </b>
     <c> 7 </c>
 </slide>

Solution

  • Use text property. For example:

    html = '''...
    <slide>
        America
        <a> 2 </a>
        <b> 3 </b>
        <c> 4 </c>
    </slide>
    
    <slide>
        Germany
        <a> 5 </a>
        <b> 6 </b>
        <c> 7 </c>
    </slide>
    ....'''
    
    import lxml.html
    root = lxml.html.fromstring(html)
    for slide in root.xpath('.//slide'):
        slide.text = ''