I'm a lisp beginner and I'm trying to write a package that defines a class for a trie and reads the entirety of the scrabble dictionary in to it. The struct acts as a node, each of which has an association list that keeps track of letters that stem from it (leading to other subtries).
Here is my code for the class
(DEFCLASS n-trie ()
((word :accessor word
:initform 'nil
:initarg :word)
(word-count :accessor wcount
:initform 0
:initarg :wcount)
(next-letters :accessor next-letters
:initform 'nil
:initarg :next-letters)))
Here is my add word function
(defun add-word (string trie) ;;iterative method for looping through string
(let ((s (coerce string 'list))
(tri trie))
(dolist (item s)
(cond
((assoc item (next-letters tri))
(incf (wcount tri))
(setf tri (cdr (assoc item (next-letters tri)))))
(t
(incf (wcount tri))
(setf (next-letters tri) (acons item (make-instance 'n-trie) (next-letters tri)))
(setf tri (cdr (assoc item (next-letters tri)))))))
(setf (word tri) string)))
and here is the function that opens my file (scrabble dictionary) and reads each line
(defun read-words (file trie)
(let((str (open file)))
(labels ((rec (tri)
(let ((line (read-line str nil nil)))
(cond
(line (add-word line tri)
(rec tri))
(t (close str)
trie)))))
(rec trie))))
Whenever I try to load the entire dictionary, I get a stack overflow. There are over 100k words in the scrabble dictionary, and it's failing at 6000...something is wrong with my memory usage, but I can't seem to tell what.
Is there something that I am doing in these definitions that is inherently expensive memory-wise? I tried making the trie node a struct instead of a class, and got similar results. I also had a recursive algorithm for adding a word from the dictionary, but it was just as bad.
I've been struggling with this for hours, and i'm a little frustrated...
The first thing I would change is the function read-words
. It uses tail-recursion and looks like in Scheme. That's not idiomatic in Common Lisp. Use WITH-OPEN-FILE
to open a file and use a loop construct to read the lines. If the Common Lisp system does not optimize the tail recursion, this recursion already creates a stack overflow on large text files.
So:
don't use tail recursion, where not necessary and where you know that your CL implementation actually supports it and understands it. For example high debug modes usual prevent tail recursion optimization.
use WITH-OPEN-FILE
. Don't use OPEN
/CLOSE
.
use IF
instead of COND
- especially when we deal with a normal true/false predicate.