XML parsing expat in python handling data

I am attempting to parse an XML file using python expat. I have the following line in my XML file:

<Action>&lt;fail/&gt;</Action>

expat identifies the start and end tags but converts the & lt; to the less than character and the same for the greater than character and thus parses it like this:

outcome:

START 'Action'
DATA '<'
DATA 'fail/'
DATA '>'
END 'Action'

instead of the desired:

START 'Action'
DATA '&lt;fail/&gt;'
END 'Action'

I would like to have the desired outcome, how do I prevent expat from messing up?

Solution

expat does not mess up, < is simply the XML encoding for the character <. Quite to the contrary, if expat would return the literal <, this would be a bug with respect to the XML spec. That being said, you can of course get the escaped version back by using xml.sax.saxutils.escape:

>>> from xml.sax.saxutils import escape
>>> escape("<fail/>")
'&lt;fail/&gt;'

The expat parser is also free to report all string data in whatever chunks it seems fit, so you have to concatenate them yourself.