Search code examples
lisptrie

optimizing functions (memory) in lisp - trie data structure


I'm a lisp beginner and I'm trying to write a package that defines a class for a trie and reads the entirety of the scrabble dictionary in to it. The struct acts as a node, each of which has an association list that keeps track of letters that stem from it (leading to other subtries).

Here is my code for the class

(DEFCLASS n-trie ()
  ((word :accessor word
         :initform 'nil
         :initarg :word)
   (word-count :accessor wcount
               :initform 0
               :initarg :wcount)
   (next-letters :accessor next-letters
                 :initform 'nil
                 :initarg :next-letters)))

Here is my add word function

(defun add-word (string trie) ;;iterative method for looping through string
  (let ((s (coerce string 'list))
        (tri trie))
    (dolist (item s)
      (cond
       ((assoc item (next-letters tri))
        (incf (wcount tri))
        (setf tri (cdr (assoc item (next-letters tri)))))
       (t
        (incf (wcount tri))
        (setf (next-letters tri) (acons item (make-instance 'n-trie) (next-letters tri)))
        (setf tri (cdr (assoc item (next-letters tri)))))))
    (setf (word tri) string)))

and here is the function that opens my file (scrabble dictionary) and reads each line

(defun read-words (file trie)
  (let((str (open file)))
    (labels ((rec (tri)
                  (let ((line (read-line str nil nil)))
                    (cond 
                     (line (add-word line tri)
                           (rec tri))
                     (t (close str)
                        trie)))))
      (rec trie))))

Whenever I try to load the entire dictionary, I get a stack overflow. There are over 100k words in the scrabble dictionary, and it's failing at 6000...something is wrong with my memory usage, but I can't seem to tell what.

Is there something that I am doing in these definitions that is inherently expensive memory-wise? I tried making the trie node a struct instead of a class, and got similar results. I also had a recursive algorithm for adding a word from the dictionary, but it was just as bad.

I've been struggling with this for hours, and i'm a little frustrated...


Solution

  • The first thing I would change is the function read-words. It uses tail-recursion and looks like in Scheme. That's not idiomatic in Common Lisp. Use WITH-OPEN-FILE to open a file and use a loop construct to read the lines. If the Common Lisp system does not optimize the tail recursion, this recursion already creates a stack overflow on large text files.

    So:

    • don't use tail recursion, where not necessary and where you know that your CL implementation actually supports it and understands it. For example high debug modes usual prevent tail recursion optimization.

    • use WITH-OPEN-FILE. Don't use OPEN/CLOSE.

    • use IF instead of COND - especially when we deal with a normal true/false predicate.