<member>
<detaileddescription>
Hello
<formula id="39">my</formula>
name
<formula id="102">is</formula>
Buddy.
<formula id="103">I</formula>
am a
<itemizedlist>
<listitem>
superhero
<formula id="104">.</formula>
</listitem>
<listitem>
At least,
<formula id="105">I think</formula>
</listitem>
</itemizedlist>
so...:)
<simplesect kind="see">
What
<ref refid="ref_id" kindref="ref_kindref">do you</ref>
<bold>think</bold> ?
</simplesect>
Let me know. :)
</detaileddescription>
</member>
The following source code works:
from xml.parsers import expat
xmlFile = "xml.xml"
class xmlText(object):
def __init__(self):
self.textBuff = ""
def CharacterData(self, data):
data = data.strip()
if data:
data = data.encode('ascii')
self.textBuff += str(data) + "\n"
def Parse(self, fName):
xmlParser = expat.ParserCreate()
xmlParser.CharacterDataHandler = self.CharacterData
xmlParser.Parse(open(fName).read(), 1)
xText = xmlText()
xText.Parse(xmlFile)
print("Text from %s\n=" % xmlFile)
print(xText.textBuff)
Output
Text from xml.xml
=
b'Hello'
b'my'
b'name'
b'is'
b'Buddy.'
b'I'
b'am a'
b'superhero'
b'.'
b'At least,'
b'I think'
b'so...:)'
b'What'
b'do you'
b'think'
b'?'
b'Let me know. :)'
However, it adds a character b
in front of each line of text.
How can I remove that?
Remove the line
data = data.encode('ascii')