I want to write a DCG predicate that will accept an alphabetic label, a space, a pseudolabel that may contain spaces or letters, another space, and another alphabetic label, and finally a period, like this:
label_madness --> label(Table1), " ", label_with_spaces(Rel), " ", label(Table2), ".".
Here's the code for labels:
label(A) --> letters(S), {string_to_atom(S, A)}, !.
label_with_spaces(A) --> letters_or_spaces(S), {string_to_atom(S, A)}, !.
letters([C|D]) --> letter(C), letters(D), !.
letters([C]) --> letter(C), !.
letters_or_spaces([C|D]) --> letter(C), letters_or_spaces(D), !.
letters_or_spaces([C|D]) --> spacehyphen(C), letters_or_spaces(D), !.
letters_or_spaces([C]) --> letter(C), !.
letters_or_spaces([C]) --> spacehyphen(C), !.
letter(C) --> [C], {"a"=<C, C=<"z"}, !.
letter(C) --> [C], {"A"=<C, C=<"Z"}, !.
spacehyphen(E) --> " ", {from_list("-", E)}, !. % spaces are replaced with hyphens in the pseudolabel
from_list([E], E).
Now when I feed label_madness
a string like "Alice is responsible for Bob."
, it fails. For mysterious reasons trace
refuses to work, but I assume it fails because DCG matches the whole is responsible for Bob
for Rel
. I tried with a nonspace separators between the labels and it works fine. How should I rewrite the label_with_spaces
predicate to only consume as much input as required?
The problem in your solution is that you are commiting the parse before time (using the cut, !) When you parse letters_or_spaces you really don't know how much input to process, because you have to parse until the second to last label (within spaces).
So, you should let the prolog engine backtrack in that predicate to allow the selection of the right phrase in letters_or_spaces. Something like (just showing the changes to your code, that is removing the cut from some predicate clauses):
label(A) --> letters(S), {string_to_atom(S, A)}.
label_with_spaces(A) --> letters_or_spaces(S), {string_to_atom(S, A)}.
letters_or_spaces([C|D]) --> letter(C), letters_or_spaces(D).
letters_or_spaces([C|D]) --> spacehyphen(C), letters_or_spaces(D).
letters_or_spaces([C]) --> letter(C).
letters_or_spaces([C]) --> spacehyphen(C).
You might as well change your parser a bit and instead of using backtracking just parse until the period in letters_or_spaces and then split the last label from it.