Search code examples
regexpattern-matchingracketparser-combinators

Match and split by regular expression from start of string


I'm trying to make a terminal parser (for a parser combinator) from scratch. My approach is to use regexp-match-positions* on the input string and if the pattern is found at the first position, then we output the split string.

This is what I've got, so far:

#lang racket/base

(require racket/match)

(define (make-terminal-parser pattern)
  (define (regexp-match-from-start pattern input)
    (match (regexp-match-positions* pattern input)
      [(list (cons 0 x) ...)
        (let ([index (car x)])
          (values (substring input 0 index)
                  (substring input index)))]

      [_ (error "Not found!")]))

  (lambda (input)
    (regexp-match-from-start pattern input)))

(define ALPHA (make-terminal-parser #rx"[a-zA-Z]"))

(ALPHA "hello")

My ALPHA doesn't seem to work and I think it's because of the pattern matching not equating with anything. In the REPL, (regexp-match-positions* #rx"[a-zA-Z]" "hello") outputs what I would expect ('((0 . 1) (1 . 2) etc.)), so I don't really understand why that doesn't match with (list (cons 0 x) ...). If I change the regular expression to #rx"h", then it correctly splits the string; but obviously this is too specific.

(On a related note: I don't understand why I need to (car x) to get the actual index value out of the matched cons.)


Solution

  • It turns out the problem I was having was indeed with my pattern matching. I was attempting to match on (list (cons 0 x) ...), but the documentation implies that will only match a list of one-or-more elements of (0 . x) (where x is arbitrary). That's not what I want.

    Lists are a series of cons, so I changed my matching criteria to (cons (cons 0 x) _) and that gives me what I want.

    That also explains why I had to (car x) in my previous attempt. The x match in (list (cons 0 x) ...) would have matched every righthand element of each cons in the list, so it would have returned a list. For example '((0 . 1) (0 . 2) (0 . 3)) would have matched and x would equal '(1 2 3).

    So, my fixed code is:

    (define (make-terminal-parser pattern)
      (define (regexp-match-from-start pattern input)
        (match (regexp-match-positions pattern input)
          [(cons (cons 0 index) _)
              (values (substring input 0 index)
                      (substring input index))]
    
          [_ (error "Not found!")]))
    
      (lambda (input)
        (regexp-match-from-start pattern input)))
    

    n.b., I also don't need to use the starred version of regexp-match-positions with pattern matching, fwiw.