Search code examples
pythonpython-3.xrecursiondefaultdict

Python 3 - RecursionError using defaultdict


What am I trying to achieve here ?

I want to read every .txt files in a directory and store them into a defaultdict called documents. The key of this defaultdict should be the name of the document and its value should be the content of the document.

Note that some of the .txt files are part of the same document (like different pages of a news article): in that case, I want to be able to update documents and append the content of a .txt file if the document already exists in the defauldict.

In order to do so, I've been implementing this class:

class Document(object):
'''
    Could be an article, a letter, an interview or whatever
'''
    def __init__(self):
        self.name = None
        self.text = ''
        self.image = None

    @property 
    def name(self):
        return name

    @name.setter
    def name(self, name):
        self.name = name

    def append_text(self, text):
        self.text += ' ' + text


Then, I use this function to go through all files in a directory and create the defaultdict:

def get_documents_from(dir_path):

    documents = defaultdict(lambda: Document())

    for filename in [f for f in os.listdir(dir_path) if f.endswith('.txt')]:
        name, _ = parse_filename(filename)
        documents[name].append_text(read_txt(filename))
        documents[name].name = name

    return documents

Here, the function parse_filename helps me get the name of the document being read. The function read_text returns the content of the document as a string.


When I execute the lines below in a main.py

my_dir = 'path/to/directory'
documents = get_documents_from(my_dir)

I get the following error:

File "lda_TM.py", line 17, in <module>
documents = get_documents_from(my_dir)
  File "/path/to/main.py", line 36, in get_documents_from
documents[name].append_text(read_txt(filename))
  File "/path/to/main.py", line 32, in <lambda>
documents = defaultdict(lambda: Document())
  File "path/to/Document.py", line 8, in __init__
self.name = None
  File "path/to/Document.py", line 19, in name
self.name = name
  File "path/to/Document.py", line 19, in name
self.name = name
  File "path/to/Document.py", line 19, in name
self.name = name
  [Previous line repeated 491 more times]
RecursionError: maximum recursion depth exceeded


I really don't understand why I'm getting this error... Is it because the class Document has not been implemented correctly or is it because I can't use my own object with a defaultdict ?

I know I could probably fix this by using a simple dict and by creating a new Document everytime I encounter a new name (or update a Document if the name already exists) but this doesn't seem very efficient and a bit unpythonic...

Also, I am aware that creating a defaultdict which uses the name of a document as a key and an object Document (that already embedds that same name) as a value may seem strange. I just thought that if I created a list of Document instead of a dict, I would be forced to implement a search function in order to update a Document. Using a defaultdict seemed more efficient (as I would convert it as a List soon after reading all files).

Much thanks for your help and suggestions !


William


Solution

  • Your class document has an attribute name and a property name. The second one overrides the first one.

    When you do:

    @property 
    def name(self):
        return name
    

    Firstly, I assume there is a typo and instead it is return self.name, otherwise you'd get a NameError. Second, what you are trying to return is the value of name, but now name is the property, which in turn tries to return the value of name, and so on.

    The typical solution is to have the attribute called something like _name so it does not get hidden by the property.