recently I've been playing with DCG in Prolog but I've been facing some issues regarding how exactly does it work. For instance, I have this small grammar:
<atom> :: <letter> <atom_part> | <letter>
<atom_part> :: <letter> | <digit> | <letter> <atom_part> | <digit> <atom_part>
<letter>:: 'a' | 'b' ... |'Z'
<digit> :: '0' |...|'9'
Which, If I'm not mistaken, is any string of letters or numbers that must start with a letter. Anyway, my attempt to parse it is the following:
letter("a") --> "a".
number(X) --> number(X).
...
%etc
programme(I) --> atomm(I).
atomm(C) --> letter(Ch).
atomm(C) --> numb(Ch).
atomm((E)) --> atomm_part(E).
atomm_part(E1,E2) --> atomm(E1),!,atomm(E2).
Here I think is clear that the last 2 lines are wrong. That's really because I'm not sure how to make the 'recursive call' so the parser goes again to check if the next character in the string is a number or a string. How can I correct this? Thanks in advance!
btw, i'm using swi-prolog
Your grammar seems more complex than needed, and you could simplify it using 'epsilon' (empty production, in DCG is []
). That apart, you should keep the 'program' more adherent to the specification.
atom --> letter, atom_part | letter.
atom_part --> letter | digit | letter, atom_part | digit, atom_part.
letter --> "a" | "b" | /* omissis... */ "Z".
digit --> [D], {memberchk(D, "0123456789")}.
You can see how similar is to the original specification. With that
?- phrase(atom, "a").
true ;
false.
?- phrase(atom, "3a").
false.
?- phrase(atom, "a3").
true ;
false.
letter
and digit
show different ways to match single characters. digit
it's simpler if you need to capture values from input, as done in your code. But because enumerating 26*2 characters it's error prone, please consider using code_type/2
atom(A) --> letter(L), atom_part(P), {A=[L|P]} | letter(L), {A=[L]}.
atom_part(P) --> letter(L), {P=[L]} | digit(D), {P=[D]} | letter(L), atom_part(A), {P=[L|A]} | digit(D), atom_part(A), {P=[D|A]}.
letter(L) --> [L], {code_type(L, alpha)}.
digit(D) --> [D], {memberchk(D, "0123456789")}.
Also consider that usually alternatives in Prolog are encoded in this way
atom([L|P]) --> letter(L), atom_part(P).
atom([L]) --> letter(L).
The simpler form allows moving the 'data construction' in head pattern.