Search code examples
pythonxmlelementtreemarc

ValueError: list.remove(x): x not in list when trying to remove an element with ElementTree


I have a marc xml file with two records in a collection. I want to remove the 955 datafields from the file.

When I try and iterate through the list produced from findall, I get a ValueError, list.remove(x): x not in list.

import xml.etree.ElementTree as ET
tree = ET.parse('toggle.xml')
root = tree.getroot()

for a955 in root.findall('record/datafield[@tag="955"]'):
    root.remove(a955)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[1], line 6
      3 root = tree.getroot()
      5 for a955 in root.findall('record/datafield[@tag="955"]'):
----> 6     root.remove(a955)

ValueError: list.remove(x): x not in list

Here is the xml I'm trying to modify (I've removed a few datafields for brevity):

<collection>
    <record>
        <leader>00859cam a2200277Ia 4500</leader>
        <controlfield tag="005">20170510144913.0</controlfield>
        <controlfield tag="008">880930s1983    enka          00010 eng d</controlfield>
        <datafield tag="035" ind1=" " ind2=" ">
            <subfield code="a">ocm13279646 880930</subfield>
        </datafield>
        <datafield tag="035" ind1=" " ind2=" ">
            <subfield code="9">0674-46060</subfield>
        </datafield>
        <datafield tag="035" ind1=" " ind2=" ">
            <subfield code="a">(StEdNL)1580610-nlsdb-Voyager</subfield>
        </datafield>
        <datafield tag="700" ind1="1" ind2="0">
            <subfield code="a">Evans, Martin.</subfield>
        </datafield>
        <datafield tag="710" ind1="2" ind2="0">
            <subfield code="a">Health Education Council.</subfield>
            <subfield code="w">cn</subfield>
        </datafield>
        <datafield tag="710" ind1="2" ind2="0">
            <subfield code="a">Teachers' Advisory Council on Alcohol and Drug Education.</subfield>
        </datafield>
        <datafield tag="955" ind1=" " ind2=" ">
            <subfield code="a">QP4.88.1745</subfield>
            <subfield code="b">QP4DOT88DOT</subfield>
        </datafield>
        <datafield tag="956" ind1=" " ind2=" ">
            <subfield code="a">NLS</subfield>
        </datafield>
    </record>
    <record>
        <leader>01030cas a2200349 i 4500</leader>
        <controlfield tag="005">20190312175642.0</controlfield>
        <controlfield tag="008">130830c20139999stkwr ne      0   a0eng d</controlfield>
        <datafield tag="015" ind1=" " ind2=" ">
            <subfield code="a">GBB386135</subfield>
            <subfield code="2">bnb</subfield>
        </datafield>
        <datafield tag="022" ind1="1" ind2=" ">
            <subfield code="a">2053-6496</subfield>
        </datafield>
        <datafield tag="035" ind1=" " ind2=" ">
            <subfield code="a">(Uk)016484976</subfield>
        </datafield>
        <datafield tag="035" ind1=" " ind2=" ">
            <subfield code="a">2992934</subfield>
        </datafield>
        <datafield tag="035" ind1=" " ind2=" ">
            <subfield code="a">(StEdNL)5112576-nlsdb-Voyager</subfield>
        </datafield>
        <datafield tag="651" ind1=" " ind2="0">
            <subfield code="a">Troon (Scotland)</subfield>
            <subfield code="v">Newspapers.</subfield>
        </datafield>
        <datafield tag="651" ind1=" " ind2="0">
            <subfield code="a">South Ayrshire (Scotland)</subfield>
            <subfield code="v">Newspapers.</subfield>
        </datafield>
        <datafield tag="752" ind1=" " ind2=" ">
            <subfield code="a">Scotland</subfield>
            <subfield code="b">Strathclyde</subfield>
            <subfield code="d">Troon.</subfield>
            <subfield code="2">blnpn</subfield>
        </datafield>
        <datafield tag="919" ind1=" " ind2=" ">
            <subfield code="a">NBS</subfield>
        </datafield>
        <datafield tag="955" ind1=" " ind2=" ">
            <subfield code="y">2020</subfield>
            <subfield code="b">V000258858</subfield>
        </datafield>
    </record>
</collection>

I was basing this on the example from the ElementTree docs:

for country in root.findall('country'):
    # using root.findall() to avoid removal during traversal
    rank = int(country.find('rank').text)
    if rank > 50:
        root.remove(country)

I'm sure I'm doing something very basic incorrectly but I just can't work out what it is.


Solution

  • Per the documentation:

    remove(subelement)
    Removes subelement from the element.

    In that context a subelement is a direct child (sadly the ElementTree docs use very loose language so in other contexts subelement is used to qualify any element below the current one). remove is not going to iterate through the entire tree looking for the element you're asking it to remove.

    The element you select is not a child of root (aka collection), it's a child of record. So you can't remove it via root, you need to get a handle on record.

    Since as far as I know ElementPath is in pure python, you'd probably benefit from just implementing things longhand:

    for record in root.iter('record'):
        for c in reversed(record):
            if c.tag == 'datafield' and c.get('tag') == '955':
                record.remove(c)
    

    You could also index and slice out the elements to remove but that seems a bit much.