Search code examples
pythonrecursionbeautifulsoupruntime-error

BeautifulSoup: RuntimeError: maximum recursion depth exceeded


I can't avoid the maximum recursion depth Python RuntimeError using BeautifulSoup.

I'm trying to recurse over nested sections of code and pull out the content. The prettified HTML looks like this (don't ask why it looks like this :)):

<div><code><code><code><code>Code in here</code></code></code></code></div>

The function I'm passing my soup object to is:

def _strip_descendent_code(self, soup):
    sys.setrecursionlimit(2000)
    # soup = BeautifulSoup(html, 'lxml')
    for code in soup.findAll('code'):
        s = ""
        for c in code.descendents:
            if not isinstance(c, NavigableString):
                if c.name != code.name:
                    continue
                elif c.name == code.name:
                    if isinstance(c, NavigableString):
                        s += str(c)
                    else:
                        continue
        code.append(s)
    return str(soup)

You can see I'm trying to increase the default recursion limit but this is not a solution. I've increased up to the point that C hits the memory limit on computer, and the function above never works.

Any help to get this to work and point out the error/s would be much appreciated.

The stack trace repeats this:

  File "/Users/almccann/.virtualenvs/evernoteghost/lib/python3.4/site-packages/bs4/element.py", line 1234, in find
    l = self.find_all(name, attrs, recursive, text, 1, **kwargs)
  File "/Users/almccann/.virtualenvs/evernoteghost/lib/python3.4/site-packages/bs4/element.py", line 1255, in find_all
    return self._find_all(name, attrs, text, limit, generator, **kwargs)
  File "/Users/almccann/.virtualenvs/evernoteghost/lib/python3.4/site-packages/bs4/element.py", line 529, in _find_all
    i = next(generator)
  File "/Users/almccann/.virtualenvs/evernoteghost/lib/python3.4/site-packages/bs4/element.py", line 1269, in descendants
    stopNode = self._last_descendant().next_element
  File "/Users/almccann/.virtualenvs/evernoteghost/lib/python3.4/site-packages/bs4/element.py", line 284, in _last_descendant
    if is_initialized and self.next_sibling:
  File "/Users/almccann/.virtualenvs/evernoteghost/lib/python3.4/site-packages/bs4/element.py", line 997, in __getattr__
    return self.find(tag)
  File "/Users/almccann/.virtualenvs/evernoteghost/lib/python3.4/site-packages/bs4/element.py", line 1234, in find
    l = self.find_all(name, attrs, recursive, text, 1, **kwargs)
  File "/Users/almccann/.virtualenvs/evernoteghost/lib/python3.4/site-packages/bs4/element.py", line 1255, in find_all
    return self._find_all(name, attrs, text, limit, generator, **kwargs)
  File "/Users/almccann/.virtualenvs/evernoteghost/lib/python3.4/site-packages/bs4/element.py", line 529, in _find_all
    i = next(generator)
  File "/Users/almccann/.virtualenvs/evernoteghost/lib/python3.4/site-packages/bs4/element.py", line 1269, in descendants
    stopNode = self._last_descendant().next_element
  File "/Users/almccann/.virtualenvs/evernoteghost/lib/python3.4/site-packages/bs4/element.py", line 284, in _last_descendant
    if is_initialized and self.next_sibling:
  File "/Users/almccann/.virtualenvs/evernoteghost/lib/python3.4/site-packages/bs4/element.py", line 997, in __getattr__
    return self.find(tag)
  File "/Users/almccann/.virtualenvs/evernoteghost/lib/python3.4/site-packages/bs4/element.py", line 1234, in find
    l = self.find_all(name, attrs, recursive, text, 1, **kwargs)
  File "/Users/almccann/.virtualenvs/evernoteghost/lib/python3.4/site-packages/bs4/element.py", line 1255, in find_all
    return self._find_all(name, attrs, text, limit, generator, **kwargs)
  File "/Users/almccann/.virtualenvs/evernoteghost/lib/python3.4/site-packages/bs4/element.py", line 512, in _find_all
    strainer = SoupStrainer(name, attrs, text, **kwargs)
  File "/Users/almccann/.virtualenvs/evernoteghost/lib/python3.4/site-packages/bs4/element.py", line 1548, in __init__
    self.text = self._normalize_search_value(text)
  File "/Users/almccann/.virtualenvs/evernoteghost/lib/python3.4/site-packages/bs4/element.py", line 1553, in _normalize_search_value
    if (isinstance(value, str) or isinstance(value, collections.Callable) or hasattr(value, 'match')
RuntimeError: maximum recursion depth exceeded while calling a Python object

Solution

  • I had encountered this problem and browsed a lot of web pages. I summary two methods to solve this problem.

    However, I think we should know why that happened. Python limits the number of recursive(default number is 1000). We can see this number with print sys.getrecursionlimit(). I guess that BeautifulSoup uses recursion to find child elements. When recursion is more than 1000 times, RuntimeError: maximum recursion depth exceeded will appear.

    First method: use sys.setrecursionlimit() set limited number of recursive. You obviously can set 1000000, but maybe cause segmentation fault.

    Second Method: use try-except. If appeared maximum recursion depth exceeded, Our algorithm might have problems. Generally speaking, we can use loops instead of recursion. In your question, we could deal with HTML with replace() or regular expression in advance.

    Finally, I give an example.

    from bs4 import BeautifulSoup
    import sys   
    #sys.setrecursionlimit(10000)
    
    try:
        doc = ''.join(['<br>' for x in range(1000)])
        soup = BeautifulSoup(doc, 'html.parser')
        a = soup.find('br')
        for i in a:
            print i
    except:
        print 'failed'
    

    If removed the #, it could print doc.

    Hoping to help you.