Search code examples
prologgrammardcg

Prolog DCG illustration please


Currently I am playing with DCG in Prolog to parse an XML file. I was able to get the following code snippet which is able to parse a simple XML such as:

<author> <name> <f> arthur </f>
        <m> conan </m>
        <l> doyle </l>
     </name>
     <bday> <d> 22 </d>
        <m> 5  </m>
        <y> 1859 </y>
     </bday>
</author>

<author> <name> <f> william </f>
        <l> shakespeare </l>
     </name>
     <bday> <d> 23 </d>
        <m> 4  </m>
        <y> 1564 </y>
     </bday>
</author>
$

and the DCG is defined as:

xml([E]) --> element(E).
xml([E|L]) --> element(E), xml(L).

element(E) -->  begintag(N), elements(L), endtag(N), {E =.. [N|L]}.

elements(L) --> xml(L).
elements([E]) --> [E].

begintag(N) --> ['<', N, '>'].
endtag(N) -->   ['<', '/', N, '>'].

Would someone please illustrate how the DCG works in this case? I am having a really hard time trying to understand the arguments in the DCG (e.g. [E] in xml([E]); [E|L] in xml([E|L]). ) Thank you!


Solution

  • Think declaratively: A DCG always describes a list. In a DCG body, comma (",") is read as "and then". So for example, xml//1 describes (by its first rule) either a single element, or (by its second rule) an element and then something that is again described by xml//1. element//1 is a begintag, then elements, then an endtag etc. The arguments used in the DCG heads let you relate the sequence that is described by the DCG bodies to other information, like a subsequence you are particularly interested in. Take for example begintag//1 - you can ask:

    ?- phrase(begintag(T), [<, test, >]).
    T = test.
    

    and also in the other direction:

    ?- phrase(begintag(test), Ls).
    Ls = [<, test, >].
    

    In the case of begintag//1, the tag name seems to be important to you, so you have introduced an argument for it that, on the one hand, lets you generate an opening tag with a given name, and, on the other hand, lets you parse a tag and extract its name, and, on the third hand, ask even the most general query:

    ?- phrase(begintag(T), Ls).
    Ls = [<, T, >].
    

    which abstractly relates an opening XML tag to its name.