I am new to pypy and wanted to check out whether it can speed up my applicaiton. Pypy documentation says that pypy supports standard python libraries with minor exceptions. The problem I faced a simple test case using ElementTree for xml parsing behaves differently, as pypy only preserves first letter of each tag.
Sample input XML (from ElementTree documentation):
<?xml version="1.0"?>
<data>
<country name="Liechtenstein">
<rank>1</rank>
<year>2008</year>
<gdppc>141100</gdppc>
<neighbor name="Austria" direction="E"/>
<neighbor name="Switzerland" direction="W"/>
</country>
</data>
My python code:
import xml.etree.ElementTree as ET
tree = ET.parse('ettest.xml')
root = tree.getroot()
print root.tag
Console output:
$ python ettest.py
data
$ pypy ettest.py
d
Pypy only prints first letter of a tag. I think ElementTree is pure python so I am wondering is it a pypy bug or am I missing some pypy magic?
For reference, running under Windows 10 64-bit, with the following python and pypy versions:
$ python -V
Python 2.7.13 :: Continuum Analytics, Inc.
$ pypy --version
Python 2.7.13 (c925e7381036, Jun 06 2017, 05:28:16)
[PyPy 5.8.0 with MSC v.1500 32 bit]
Yes, it is a known bug involving the expat library (likely the bindings done in PyPy). It only shows up on Windows on recent versions of PyPy, for no known reason so far: https://bitbucket.org/pypy/pypy/issues/2641/