python xml xlsx xlrd client-side-attacks

How to prevent "billion laughs" DoS attack in Python's xlrd?

The Billion Laughs DoS attack seems preventable by simply stopping entities in XML files from being expanded. Is there a way to do this in Python's xlrd library (i.e. a flag of some sort)? If not, is there a recommended way to avoid the attack?

Solution

Not with xlrd by itself

There is no option in xlrd at this time for preventing any sort of XML bomb. In the source code, the xlsx data is passed to python's built-in xml.etree for parsing without any validation:

import xml.etree.ElementTree as ET

def process_stream(self, stream, heading=None):
        if self.verbosity >= 2 and heading is not None:
            fprintf(self.logfile, "\n=== %s ===\n", heading)
        self.tree = ET.parse(stream)

However, it may be possible to patch `ElementTree` using defusedxml

As noted in the comments, defusedxml is a package targeted directly at the problem of security against different types of XML bombs. From the docs:

Instead of:

from xml.etree.ElementTree import parse
et = parse(xmlfile)

alter code to:

from defusedxml.ElementTree import parse
et = parse(xmlfile)

It also provides the functionality of patching the standard library. Since that is what xlrd is using, you are able to use the combination of xlrd and defusedxml to read Excel files while protecting yourself from XML bombs.

Additionally the package has an untested function to monkey patch all stdlib modules with defusedxml.defuse_stdlib().

How to prevent "billion laughs" DoS attack in Python's xlrd?

Not with xlrd by itself

However, it may be possible to patch ElementTree using defusedxml

However, it may be possible to patch `ElementTree` using defusedxml