The Billion Laughs DoS attack seems preventable by simply stopping entities in XML files from being expanded. Is there a way to do this in Python's xlrd library (i.e. a flag of some sort)? If not, is there a recommended way to avoid the attack?
There is no option in xlrd at this time for preventing any sort of XML bomb. In the source code, the xlsx data is passed to python's built-in xml.etree
for parsing without any validation:
import xml.etree.ElementTree as ET
def process_stream(self, stream, heading=None):
if self.verbosity >= 2 and heading is not None:
fprintf(self.logfile, "\n=== %s ===\n", heading)
self.tree = ET.parse(stream)
ElementTree
using defusedxmlAs noted in the comments, defusedxml is a package targeted directly at the problem of security against different types of XML bombs. From the docs:
Instead of:
from xml.etree.ElementTree import parse et = parse(xmlfile)
alter code to:
from defusedxml.ElementTree import parse et = parse(xmlfile)
It also provides the functionality of patching the standard library. Since that is what xlrd is using, you are able to use the combination of xlrd and defusedxml to read Excel files while protecting yourself from XML bombs.
Additionally the package has an untested function to monkey patch all stdlib modules with defusedxml.defuse_stdlib().