Search code examples
parsingprologswi-prologdcg

Prolog DCG for parsing escaped sequences


I need to parse the string ^borrow$ ^\$500$ into the list [borrow, $500]. The grammar I wrote so far is

:- use_module(library(dcg/basics)).

write_list([]).
write_list([H|T]) :- atom_codes(S, H), write(S), nl, write_list(T).

% Grammar.
tags([Tag|Rest]) --> string(_), tag(Tag), tags(Rest).
tags([]) --> string(_).
tag(Tag) --> "^", tag_contents(Tag), "$".
tag_contents(Tag) --> string(Tag).

Which works when I don't have \$ inside a token:

?- phrase(tags(T), "^pisica$ ^catel$"), write_list(T).
pisica
catel
?- phrase(tags(T), "^borrow$ ^\\$500$"), write_list(T).
borrow
\

What is the best practice for parsing this kind of escaped sequences with Prolog DCGs?


Solution

  • the problem is that tag_contents//1 captures just the backslash, and then $ acts a tag stop in parent call.

    Here is a ugly hack around this problem:

    tag(Tag1) -->
       "^", tag_contents(Tag), [C], "$", {C \= 0'\\, append(Tag, [C], Tag1) }.
    

    edit

    a somewhat better one:

    tag(Tag) --> "^", tag_contents(Tag), "$", {\+last(Tag, 0'\\)}.
    

    edit

    'best practice' is of course to handle nested content with contextual rules. You need more code tough...

    tag(Tag) --> "^", tag_contents(Tag).
    
    tag_contents([0'\\,C|Cs]) --> "\\", [C], !, tag_contents(Cs).
    tag_contents([]) --> "$".
    tag_contents([C|Cs]) --> [C], tag_contents(Cs).