Search code examples
prologoperatorsstring-concatenation

How to define a string concatenation operator in prolog?


I am manipulating strings of characters in prolog, and I would like to avoid the use of too many temporary variables in my rules.

I would like to transform something like this:

process_str(Str, Next) :-
    is_valid_pattern0(Pattern0),
    concat(Pattern0, Tail0, Str),
    concat("->", Tail1, Tail0),
    is_valid_pattern1(Pattern1),
    concat(Pattern1, Tail2, Tail1),
    concat("|", Tail2, Next).

using a concatenation operator definition, into something like:

process_str(Pattern0.."->"..Pattern1.."|"..Next, Next) :-
    is_valid_pattern0(Pattern0),
    is_valid_pattern1(Pattern1).

which I believe would be far more readable, at the expense of a few more operations depending on how the operator is defined.

I found that the documentation talks about defining operators, but as far as I understand, one can only define predicate operators, not functional operators that can "return a value" (like for instance the + operator).

Please tell me either why I am wrong or how to define such a concatenation operator.


Solution

  • Here's a solution using DCG and term_expansion that works in SWI-Prolog. First, the basics:

    :- set_prolog_flag(double_quotes, chars).
    

    This ensures that "foo" will be interpreted as a list of three characters, not some non-standard atomic "string" object.

    Then, let's assume that valid pattern0 and pattern1 matches are simply lists of letters. It's up to you to fill in the details. Some basic DCGs:

    letters -->
        [].
    letters -->
        [Letter],
        { char_type(Letter, alpha) },
        letters.
    
    pattern0 -->
        letters.
        
    pattern1 -->
        letters.
    

    For example:

    ?- phrase(pattern0, Pattern0).
    Pattern0 = [] ;
    Pattern0 = ['A'] ;
    Pattern0 = ['A', 'A'] ;
    Pattern0 = ['A', 'A', 'A'] ;
    Pattern0 = ['A', 'A', 'A', 'A'] ;
    Pattern0 = ['A', 'A', 'A', 'A', 'A'] .
    
    ?- phrase(pattern0, "helloworld").
    true.
    

    Also, the handy DCG describing simply a list:

    list([]) -->
        [].
    list([X | Xs]) -->
        [X],
        list(Xs).
    

    This doesn't seem to do much, but it will come in handy in a moment:

    ?- phrase(list([a, b, c]), List).
    List = [a, b, c].
    
    ?- phrase(list(List), [a, b, c]).
    List = [a, b, c] ;
    false.
    

    Now, you would like to define a composite pattern like Pattern0.."->"..Pattern1.."|"..Next. I suggest to write this a bit differently, namely as a list of sub-patterns: [pattern0, "->", pattern1, "|", Next]. Such a list may contain three kinds of elements:

    • DCG rule names
    • literal lists of characters
    • variables that may become bound to lists of characters

    We can then write a DCG matching some composite patterns:

    composite([]) -->
        [].
    composite([Head | Tail]) -->
        { atom(Head) },
        % Assume that this atom is the name of another DCG rule, and execute it.
        call(Head),
        composite(Tail).
    composite([Head | Tail]) -->
        list(Head),
        composite(Tail).
    

    This expresses that a composite pattern just describes a sequence of whatever its sub-patterns describe. It only has two clauses dealing with sub-patterns: One for DCG rule names (represented by atoms) and one for character lists. The case of variables is handled automatically by the character list clause!

    We can use this definition to match a character list like "foo->bar|baz" against a composite pattern:

    ?- phrase(composite([pattern0, "->", pattern1, "|", Next]), "foo->bar|baz").
    Next = [b, a, z] ;
    false.
    

    Almost done! We can pack this up in a definition encapsulating the pattern:

    process_str(Sequence, Next) :-
        phrase(composite([pattern0, "->", pattern1, "|", Next]), Sequence).
    

    This works like this:

    ?- process_str("foo->bar|baz", Next).
    Next = [b, a, z] ;
    false.
    

    I think this is already pretty good. But if you really want a kind of pattern matching syntax, term_expansion will help. Its use is (deceptively) simple: Define a clause for term_expansion(SomeTermPattern, SomeOtherTerm), and every clause definition matching SomeTermPattern will be treated as if the programmer had written SomeOtherTerm instead. So:

    term_expansion(
        % Replace every definition of this form:
        patterned_process_str(Pattern, Next),
        % by a replacement like this:
        patterned_process_str(Sequence, Next) :-
            phrase(composite(Pattern), Sequence)
    ).
    
    patterned_process_str([pattern0, "->", pattern1, "|", Next], Next).
    

    We can look at Prolog's internal representation of the source code for patterned_process_str to make sure that it is as expected:

    ?- listing(patterned_process_str).
    patterned_process_str(B, A) :-
        phrase(composite([pattern0, [-, >], pattern1, ['|'], A]), B).
    

    Variable names are lost, but otherwise our definition for patterned_process_str was expanded to the form we wanted, namely the same form that we wrote for process_str above. This definition works exactly like process_str above (since it is equivalent):

    ?- patterned_process_str("foo->bar|baz", Next).
    Next = [b, a, z] ;
    false.
    

    Exercise: Provide an operator definition for ... Write a predicate pattern_list that converts between "dotted patterns" with .. and "list patterns", for example: pattern_list(A..B..C, [A, B, C]) should succeed. Then, expand the above term_expansion rule in a way that allows you to write patterned_process_str using the "dotted pattern" syntax directly.