Search code examples

cElementTree to extract data from XML python

I have an XML file whose structure is similar to the following:

<?xml version="1.0" encoding="UTF-8"?>
<drugbank xmlns="" xmlns:xsi="" xsi:schemaLocation="" version="5.0" exported-on="2017-12-20">
    <drug type="biotech" created="2005-06-13" updated="2017-11-06">
        <drugbank-id primary="true">DB00001</drugbank-id>
    <drug type="biotech" created="2005-06-13" updated="2017-11-06">
        <drugbank-id primary="true">DB00045</drugbank-id>
        <name>Lyme disease vaccine (recombinant OspA)</name>

I am trying to utilize cElementTree module of Python 3. I would like to extract the name of each drug in this XML, for which I have written the following code:

import xml.etree.cElementTree as ET

tree = ET.parse('fulldatabase.xml')
drugbank = tree.getroot()


for drug in drugbank:

The error I get is AttributeError: 'NoneType' object has no attribute 'text'

I have also tried checking this but the answer the OP wrote in it did not work for me. Is there any way to get name and cas-number field out of each drug. I have tried some combinations like removing findall() in the for loop condition, but things did not work for me even then.


  • Do you need anything besides the name? If not this will do it. You're not using the xml namespace properly as defined in the <drugbank xmlns="" portion of the file

    for drug in drugbank.iter('{}name'):
        print drug.text
    Lyme disease vaccine (recombinant OspA)

    Here's another way to get the elements you need:

    for child in drugbank.getchildren():
        print {'cas-number': child.find('{}cas-number').text, 'name': child.find('{}name').text}
    {'cas-number': '138068-37-8', 'name': 'Lepirudin'}
    {'cas-number': '205923-56-4', 'name': 'Lyme disease vaccine (recombinant OspA)'}