Search code examples
pythonxmlxml-parsing

How can I remove the character `b` from the output?


 <member>
    <detaileddescription>
        Hello
        <formula id="39">my</formula>
        name
        <formula id="102">is</formula>
        Buddy.
        <formula id="103">I</formula>
        am a
        <itemizedlist>
            <listitem>
            superhero
            <formula id="104">.</formula>
            </listitem>
            <listitem>
                At least,
                <formula id="105">I think</formula>
            </listitem>
        </itemizedlist>
        so...:)
        <simplesect kind="see">
            What
            <ref refid="ref_id" kindref="ref_kindref">do you</ref>
            <bold>think</bold> ?
        </simplesect>
        Let me know. :)
    </detaileddescription>
 </member>

The following source code works:

from xml.parsers import expat

xmlFile = "xml.xml"


class xmlText(object):
    def __init__(self):
        self.textBuff = ""

    def CharacterData(self, data):
        data = data.strip()
        if data:
            data = data.encode('ascii')
            self.textBuff += str(data) + "\n"

    def Parse(self, fName):
        xmlParser = expat.ParserCreate()
        xmlParser.CharacterDataHandler = self.CharacterData
        xmlParser.Parse(open(fName).read(), 1)


xText = xmlText()
xText.Parse(xmlFile)
print("Text from %s\n=" % xmlFile)
print(xText.textBuff)

Output

Text from xml.xml
=
b'Hello'
b'my'
b'name'
b'is'
b'Buddy.'
b'I'
b'am a'
b'superhero'
b'.'
b'At least,'
b'I think'
b'so...:)'
b'What'
b'do you'
b'think'
b'?'
b'Let me know. :)'

However, it adds a character b in front of each line of text.

How can I remove that?


Solution

  • Remove the line

    data = data.encode('ascii')