I'm trying to make a terminal parser (for a parser combinator) from scratch. My approach is to use regexp-match-positions*
on the input string and if the pattern is found at the first position, then we output the split string.
This is what I've got, so far:
#lang racket/base
(require racket/match)
(define (make-terminal-parser pattern)
(define (regexp-match-from-start pattern input)
(match (regexp-match-positions* pattern input)
[(list (cons 0 x) ...)
(let ([index (car x)])
(values (substring input 0 index)
(substring input index)))]
[_ (error "Not found!")]))
(lambda (input)
(regexp-match-from-start pattern input)))
(define ALPHA (make-terminal-parser #rx"[a-zA-Z]"))
(ALPHA "hello")
My ALPHA
doesn't seem to work and I think it's because of the pattern matching not equating with anything. In the REPL, (regexp-match-positions* #rx"[a-zA-Z]" "hello")
outputs what I would expect ('((0 . 1) (1 . 2) etc.)
), so I don't really understand why that doesn't match with (list (cons 0 x) ...)
. If I change the regular expression to #rx"h"
, then it correctly splits the string; but obviously this is too specific.
(On a related note: I don't understand why I need to (car x)
to get the actual index value out of the matched cons.)
It turns out the problem I was having was indeed with my pattern matching. I was attempting to match on (list (cons 0 x) ...)
, but the documentation implies that will only match a list of one-or-more elements of (0 . x)
(where x
is arbitrary). That's not what I want.
Lists are a series of cons
, so I changed my matching criteria to (cons (cons 0 x) _)
and that gives me what I want.
That also explains why I had to (car x)
in my previous attempt. The x
match in (list (cons 0 x) ...)
would have matched every righthand element of each cons
in the list, so it would have returned a list. For example '((0 . 1) (0 . 2) (0 . 3))
would have matched and x
would equal '(1 2 3)
.
So, my fixed code is:
(define (make-terminal-parser pattern)
(define (regexp-match-from-start pattern input)
(match (regexp-match-positions pattern input)
[(cons (cons 0 index) _)
(values (substring input 0 index)
(substring input index))]
[_ (error "Not found!")]))
(lambda (input)
(regexp-match-from-start pattern input)))
n.b., I also don't need to use the starred version of regexp-match-positions
with pattern matching, fwiw.