Search code examples
pythonsaxonsaxon-c

Why does this code use more and more memory over time?


Python: 3.11 Saxonche: 12.4.2

My website keeps consuming more and more memory until the server runs out of memory and crashes. I isolated the problematic code to the following script:

import gc
from time import sleep

from saxonche import PySaxonProcessor


xml_str = """
<root>
    <stuff>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vestibulum ac auctor ex. Nunc in tincidunt urna. Sed tincidunt eros lacus, sed pulvinar sem venenatis et. Donec euismod orci quis pellentesque sagittis. Donec at tortor in dui mattis facilisis. Pellentesque vel varius lectus. Nunc sed gravida risus, ac finibus elit. Etiam sollicitudin nunc a velit efficitur molestie in ac lectus. Donec vulputate orci odio, sit amet hendrerit odio rhoncus commodo.</stuff>
    <stuff>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vestibulum ac auctor ex. Nunc in tincidunt urna. Sed tincidunt eros lacus, sed pulvinar sem venenatis et. Donec euismod orci quis pellentesque sagittis. Donec at tortor in dui mattis facilisis. Pellentesque vel varius lectus. Nunc sed gravida risus, ac finibus elit. Etiam sollicitudin nunc a velit efficitur molestie in ac lectus. Donec vulputate orci odio, sit amet hendrerit odio rhoncus commodo.</stuff>
    <stuff>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vestibulum ac auctor ex. Nunc in tincidunt urna. Sed tincidunt eros lacus, sed pulvinar sem venenatis et. Donec euismod orci quis pellentesque sagittis. Donec at tortor in dui mattis facilisis. Pellentesque vel varius lectus. Nunc sed gravida risus, ac finibus elit. Etiam sollicitudin nunc a velit efficitur molestie in ac lectus. Donec vulputate orci odio, sit amet hendrerit odio rhoncus commodo.</stuff>
</root>
"""

while True:
    print('Running once...')
    with PySaxonProcessor(license=False) as proc:
        proc.parse_xml(xml_text=xml_str)

    gc.collect()
    sleep(1)

This script consumes memory at a rate of about 0.5 MB per second. The memory usage does not plateau after a while. I have logs showing that memory usage continues to grow for hours until the server runs out of memory and crashes.

Other things I tried that aren't shown above:

  • Using a PyDocumentBuilder to parse the XML instead of a PySaxonProcessor. It didn't appear to change anything.
  • Deleting the Saxon processor and the return value of parse_xml() using the del Python keyword. No change.

I have to use Saxon instead of lxml because I need XPath 3.0 support.

What am I doing wrong? How do I parse XML using Saxon in a way that doesn't leak?


A few folks have suggested that instantiating the PySaxonProcessor once before the loop will fix the leak. It doesn't. This still leaks:

with PySaxonProcessor(license=False) as proc:
    while True:
        print('Running once...')
        proc.parse_xml(xml_text=xml_str)

        gc.collect()
        sleep(1)

Solution

  • It looks like a memory leak. I created a bug to track it: https://saxonica.plan.io/issues/6391

    And the issue is now fixed in the released SaxonC 12.5.