I'am trying to read "words" from a stream. A word is eather just a number "1234", a usual alpha character words "test" and contraction of usual words with apostorph "you're".
Sure there are billions of ways to do this. I'am trying to use tagbody
to implement something like a state maschine to parse the words. My implementation works if the correct input is given but fails for input that doesn't resemple words. I'am trying to skip input until new whitespace or eof is reached, but thats giving me an infinite loop and I don't know why.
Can someone expain it?
Here is my code
(defun whitespace-or-nil-p (c)
"Returns if a character is whitespace"
(member c '(#\ #\Tab #\Return #\Newline nil)))
(defun read-word (stream)
(let ((c nil))
(with-output-to-string (out)
(tagbody
read-initial
(setf c (read-char stream nil))
(cond
((whitespace-or-nil-p c) (peek-char t stream nil) (go read-initial))
((not (alphanumericp c)) (go skip-til-next))
((digit-char-p c) (go read-number))
((alpha-char-p c) (go read-simple-or-contracted-word))
(t (return-from read-word))
)
skip-til-next
(get-output-stream-string out)
(loop until (whitespace-or-nil-p (peek-char nil stream nil)) do (read-char stream nil))
(go read-initial)
read-number
(write-char c out)
(setf c (read-char stream nil))
(cond
((whitespace-or-nil-p c)
(return-from read-word (get-output-stream-string out)))
((not (digit-char-p c)) (go skip-til-next))
(t (go read-number))
)
read-simple-or-contracted-word
(write-char c out)
(setf c (read-char stream nil))
(cond
((whitespace-or-nil-p c)
(return-from read-word (get-output-stream-string out)))
((and (char/= c #\') (not (alpha-char-p c))) (go skip-til-next))
(t (go read-simple-or-contracted-word))
)
))))
Here's your code, modified to prevent infinite loops, in order to debug it. I added comments where the code is changed:
(defun read-word (stream)
(let ((c nil)
;; how many times we allow the code to enter dbg function
(counter 10))
(flet ((dbg (symbol &rest args)
;; each time it is called, we decrease counter, when it
;; reaches zero, we stop the state machine
(print (list* symbol args))
(when (<= (decf counter) 0)
(return-from read-word :too-many-loops))))
(with-output-to-string (out)
(tagbody
read-initial
(setf c (read-char stream nil))
(dbg 'read-initial c)
(cond
((whitespace-or-nil-p c) (peek-char t stream nil) (go read-initial))
((not (alphanumericp c)) (go skip-til-next))
((digit-char-p c) (go read-number))
((alpha-char-p c) (go read-simple-or-contracted-word))
(t (return-from read-word))
)
skip-til-next
(dbg 'skip-til-next)
(get-output-stream-string out)
(loop until (whitespace-or-nil-p (peek-char nil stream nil)) do (read-char stream nil))
(go read-initial)
read-number
(dbg 'read-number)
(write-char c out)
(setf c (read-char stream nil))
(cond
((whitespace-or-nil-p c)
(return-from read-word (get-output-stream-string out)))
((not (digit-char-p c)) (go skip-til-next))
(t (go read-number))
)
read-simple-or-contracted-word
(dbg 'read-simple-or-contracted-word)
(write-char c out)
(setf c (read-char stream nil))
(cond
((whitespace-or-nil-p c)
(return-from read-word (get-output-stream-string out)))
((and (char/= c #\') (not (alpha-char-p c))) (go skip-til-next))
(t (go read-simple-or-contracted-word))
)
)))))
Here is the simplest test case I can imagine:
* (with-input-from-string (in "") (read-word in))
(READ-INITIAL NIL)
(READ-INITIAL NIL)
(READ-INITIAL NIL)
(READ-INITIAL NIL)
(READ-INITIAL NIL)
(READ-INITIAL NIL)
(READ-INITIAL NIL)
(READ-INITIAL NIL)
(READ-INITIAL NIL)
(READ-INITIAL NIL)
(READ-INITIAL NIL)
:TOO-MANY-LOOPS
So you need to handle the case where read-char
returns nil
. Currently, it is treated as whitespace, and the call to peek-char
that happens in that case does not consume characters from the underlying stream (it reached end of file); you could for example observe the return value of peek-char
to avoid going back to the read-initial
label infinitely.
I also have doubts about what (get-output-stream-string out)
is supposed to be doing, especially when you call it without using its return value. I would accept a callback function and call it with each token being read, for example.