I am using definite clause grammars to parse string literals in Prolog, but this grammar rule can only parse string literals that contain alphabetic characters:
string_literal(S) --> "\"", symbol(S), "\"".
symbol([L|Ls]) --> letter(L), symbol_r(Ls).
symbol_r([L|Ls]) --> letter(L), symbol_r(Ls).
symbol_r([]) --> [].
letter(Let) --> [Let], { code_type(Let, alpha) }.
Is it possible to write a DCG rule that can parse string literals with other types of symbols?
In SWI-Prolog, library(dcg/basics) has several ready to use non terminals. The code is worth to study...
Otherwise, to generalize a bit you could pass the code type to the matching, then combine the primitives at willing:
char(Type, C) --> [C], { code_type(C, Type) }.
letter(L) --> char(alpha, L).
digit(D) --> char(digit, D).
lower_or_num(C) --> char(lower, C) | digit(C).
...
a possibility, to skip over unwanted chars (only newline or single quotes)
string_literal(S) --> "\"", string_inner(S).
string_inner([]) --> "\"".
string_inner(Cs) --> [C],
{ ( C == 0'\n ; C == 0'' ) -> Cs = Rs ; Cs = [C|Rs] },
string_inner(Rs).
edit
prevent it from matching strings that contain double quotes
the construct if -> then ; else
fails if we omit the else
branch, and the if
is false, so an attempt could be:
...
{ ( C == 0'\n ; C == 0'' ) -> Cs = Rs ; C \== 0'" -> Cs = [C|Rs] },
...