Search code examples
pythonfindallxml.etreealto

xml.etree.ElementTree .remove


I'm trying to remove tags from an Xml.Alto file with remove. My Alto file looks like this:

<alto xmlns="http://www.loc.gov/standards/alto/ns-v4#" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.loc.gov/standards/alto/ns-v4# http://www.loc.gov/standards/alto/v4/alto-4-2.xsd">   <Description>
    <MeasurementUnit>pixel</MeasurementUnit>
    <sourceImageInformation>
      <fileName>filename</fileName>
    </sourceImageInformation>   
</Description>   
<Layout>
    <Page>
      <PrintSpace>
        <TextBlock>
          <Shape><Polygon/></Shape>
          <TextLine>
            <Shape><Polygon/></Shape>
        <String CONTENT="ABCDEF" HPOS="1234" VPOS="1234" WIDTH="1234" HEIGHT="1234" />
          </TextLine>
        </TextBlock>
      </PrintSpace>
    </Page>   
</Layout> 
</alto>

AND my code is :

import xml.etree.ElementTree as ET
tree = ET.parse("file.xml")
root = tree.getroot()
ns = {'alto': 'http://www.loc.gov/standards/alto/ns-v4#'}
ET.register_namespace("", "http://www.loc.gov/standards/alto/ns-v4#")
for Test in root.findall('.//alto:TextBlock', ns):
    root.remove(Test)
    
tree.write('out.xml', encoding="UTF-8", xml_declaration=True)

Here is the error I get:

ValueError: list.remove(x): x not in list

Thanks a lot for your help 💐


Solution

  • ElementFather.remove(ElementChild) works only if the ElementChild is a sub-element of ElementFather. In your case, you have to call remove from PrintSpace.

    import xml.etree.ElementTree as ET
    tree = ET.parse("file.xml")
    root = tree.getroot()
    ns = {'alto': 'http://www.loc.gov/standards/alto/ns-v4#'}
    ET.register_namespace("", "http://www.loc.gov/standards/alto/ns-v4#")
    
    for Test in root.findall('.//alto:TextBlock', ns):
        PrintSpace = root.find('.//alto:PrintSpace',ns)
        PrintSpace.remove(Test)
        
    tree.write('out.xml', encoding="UTF-8", xml_declaration=True)
    

    Note: This code is only an example of a working solution, for sure you can improve it.