Search code examples
prologdcg

Converting a small regular expression to a DCG


I understand that Prolog programmers typically use DCGs instead of regular expressions for matching patterns in a string. In Perl, one might write

if ( '... accd' =~ /a+b*c{2,4}d$/ ) {
    say "matched";
}

How would one match the same pattern in Prolog?


Solution

  • I've extended this answer

    :- op(100, xf, *).
    :- op(100, xf, +).
    
    rexp(C) --> [C].
    
    rexp([T|Ts])   --> rexp(T), rexp(Ts).
    rexp([])       --> [].
    
    rexp(eps)      --> [].
    
    rexp(_*)       --> [].
    rexp(R*)       --> rexp(R), rexp(R*).
    
    rexp(R+)       --> rexp(R), rexp(R*).
    
    rexp((R1|R2))  --> ( rexp(R1) ; rexp(R2) ).
    
    rexp(range(R,N,M)) -->
        {between(N,M,L),
         length(D,L),
         maplist(copy_term(R),D)
        }, rexp(D).
    

    then your regular expression match could be

    ?-  phrase(rexp([a+, b*, range(c,2,4), d]), [a,c,c,d]),
        writeln(matched).
    

    Note that in this way we match atoms instead of single characters.

    edit after false' comment, I think that the first clause should read

    rexp(C) --> {atomic(C)}, [C].
    

    to avoid for instance

    ?- phrase(rexp([a+]), [a+]).
    true ;
    

    Indeed, after the correction we have the expected outcome:

    ?- phrase(rexp([a+]), [a+]).
    false.
    

    done edit

    Instead of interpreting regular expressions the pattern could be 'hardcoded' (much easier)

    % I prefer the equivalent clause below
    % p1 --> "a", p1 ; "a", p2.
    p1 --> "a", (p1 ; p2).
    p2 --> "b", p2 ; p3.
    p3 --> ("cc" ; "ccc" ; "cccc"), "d".
    

    then

    ?- phrase(p1, "accd").
    true
    

    here we match single characters (a string in Prolog it's a list of character codes)