Search code examples
pythonxmlparsinglxml.objectify

Unexpected results parsing XML in Python


I'm trying to parse the following text from the XML

title_text = word1 Word2 word3 word4

The problem is that with the code below I'm getting title_text = 'word1'.

How can I achieve that?

XML:

<response>...<results>...<grouping>...<group>...
    <doc>...
         <title>
             word1
             <hlword>Word2</hlword>
             <hlword>word3</hlword>
             word4
          </title>
          ...
    </doc>
</group>...</grouping>...</results>...</response>...

Code for parse:

from lxml import objectify
...
tree = objectify.fromstring(xml)
nodes = tree.response.results.grouping.group
for node in nodes:
    title_element = node.doc.title
    title_text = title_element.text
    print title_text

Solution

  • Just iterate over .itertext():

    >>> for node in nodes:
    ...    print(' '.join(node.doc.title.itertext()))
    ...
    word1 word2 word3 word4