Search code examples
prologdcg

Parsing string literals in Prolog


I am using definite clause grammars to parse string literals in Prolog, but this grammar rule can only parse string literals that contain alphabetic characters:

string_literal(S) --> "\"", symbol(S), "\"".
symbol([L|Ls]) --> letter(L), symbol_r(Ls).
symbol_r([L|Ls]) --> letter(L), symbol_r(Ls).
symbol_r([])     --> [].
letter(Let)     --> [Let], { code_type(Let, alpha) }.

Is it possible to write a DCG rule that can parse string literals with other types of symbols?


Solution

  • In SWI-Prolog, library(dcg/basics) has several ready to use non terminals. The code is worth to study...

    Otherwise, to generalize a bit you could pass the code type to the matching, then combine the primitives at willing:

    char(Type, C) --> [C], { code_type(C, Type) }.
    
    letter(L) --> char(alpha, L).
    digit(D) --> char(digit, D).
    lower_or_num(C) --> char(lower, C) | digit(C).
    ...
    

    a possibility, to skip over unwanted chars (only newline or single quotes)

    string_literal(S) --> "\"", string_inner(S).
    
    string_inner([]) --> "\"".
    string_inner(Cs) --> [C],
        { ( C == 0'\n ; C == 0'' ) -> Cs = Rs ; Cs = [C|Rs] },
        string_inner(Rs).
    

    edit

    prevent it from matching strings that contain double quotes

    the construct if -> then ; else fails if we omit the else branch, and the if is false, so an attempt could be:

    ...
    { ( C == 0'\n ; C == 0'' ) -> Cs = Rs ; C \== 0'" -> Cs = [C|Rs] },
    ...