Search code examples
prologdcg

Trying to understand parser DCG in prolog


recently I've been playing with DCG in Prolog but I've been facing some issues regarding how exactly does it work. For instance, I have this small grammar:

<atom> :: <letter> <atom_part> | <letter>
<atom_part> :: <letter> | <digit> | <letter> <atom_part> | <digit> <atom_part>
<letter>:: 'a' | 'b' ... |'Z'
<digit> :: '0' |...|'9'

Which, If I'm not mistaken, is any string of letters or numbers that must start with a letter. Anyway, my attempt to parse it is the following:

letter("a") --> "a".
number(X) --> number(X).
...
%etc
programme(I) --> atomm(I).
atomm(C) --> letter(Ch).
atomm(C) --> numb(Ch).
atomm((E)) --> atomm_part(E).
atomm_part(E1,E2) --> atomm(E1),!,atomm(E2).

Here I think is clear that the last 2 lines are wrong. That's really because I'm not sure how to make the 'recursive call' so the parser goes again to check if the next character in the string is a number or a string. How can I correct this? Thanks in advance!

btw, i'm using swi-prolog


Solution

  • Your grammar seems more complex than needed, and you could simplify it using 'epsilon' (empty production, in DCG is []). That apart, you should keep the 'program' more adherent to the specification.

    atom --> letter, atom_part | letter.
    atom_part --> letter | digit | letter, atom_part | digit, atom_part.
    letter --> "a" | "b" | /* omissis... */ "Z".
    digit --> [D], {memberchk(D, "0123456789")}.
    

    You can see how similar is to the original specification. With that

    ?- phrase(atom, "a").
    true ;
    false.
    
    ?- phrase(atom, "3a").
    false.
    
    ?- phrase(atom, "a3").
    true ;
    false.
    

    letter and digit show different ways to match single characters. digit it's simpler if you need to capture values from input, as done in your code. But because enumerating 26*2 characters it's error prone, please consider using code_type/2

    atom(A) --> letter(L), atom_part(P), {A=[L|P]} | letter(L), {A=[L]}.
    atom_part(P) --> letter(L), {P=[L]} | digit(D), {P=[D]} | letter(L), atom_part(A), {P=[L|A]} | digit(D), atom_part(A), {P=[D|A]}.
    letter(L) --> [L], {code_type(L, alpha)}.
    digit(D) --> [D], {memberchk(D, "0123456789")}.
    

    Also consider that usually alternatives in Prolog are encoded in this way

    atom([L|P]) --> letter(L), atom_part(P).
    atom([L]) --> letter(L).
    

    The simpler form allows moving the 'data construction' in head pattern.