Search code examples
pythonxmlelementtreepypy

pypy: elementtree tag name only preserves first letter?


I am new to pypy and wanted to check out whether it can speed up my applicaiton. Pypy documentation says that pypy supports standard python libraries with minor exceptions. The problem I faced a simple test case using ElementTree for xml parsing behaves differently, as pypy only preserves first letter of each tag.

Sample input XML (from ElementTree documentation):

<?xml version="1.0"?>
<data>
    <country name="Liechtenstein">
        <rank>1</rank>
        <year>2008</year>
        <gdppc>141100</gdppc>
        <neighbor name="Austria" direction="E"/>
        <neighbor name="Switzerland" direction="W"/>
    </country>
</data>

My python code:

import xml.etree.ElementTree as ET
tree = ET.parse('ettest.xml')
root = tree.getroot()
print root.tag

Console output:

$ python ettest.py
data

$ pypy ettest.py
d

Pypy only prints first letter of a tag. I think ElementTree is pure python so I am wondering is it a pypy bug or am I missing some pypy magic?

For reference, running under Windows 10 64-bit, with the following python and pypy versions:

$ python -V
Python 2.7.13 :: Continuum Analytics, Inc.

$ pypy --version
Python 2.7.13 (c925e7381036, Jun 06 2017, 05:28:16)
[PyPy 5.8.0 with MSC v.1500 32 bit]

Solution

  • Yes, it is a known bug involving the expat library (likely the bindings done in PyPy). It only shows up on Windows on recent versions of PyPy, for no known reason so far: https://bitbucket.org/pypy/pypy/issues/2641/