Search code examples
prologswi-prologdcg

In Prolog DCGs, how to remove over general solutions?


I have a text file containing a sequence. For example:

GGGGGGGGAACCCCCCCCCCTTGGGGGGGGGGGGGGGGAACCCCCCCCCCTTGGGGGGGG

I have wrote the following DCG to find the sequence between AA and TT.

:- use_module(library(pio)).
:- use_module(library(dcg/basics)).
:- portray_text(true).

process(Xs) :- phrase_from_file(find(Xs), 'string.txt').

anyseq([]) -->[].
anyseq([E|Es]) --> [E], anyseq(Es).

begin --> "AA".
end -->"TT".

find(Seq) -->
     anyseq(_),begin,anyseq(Seq),end, anyseq(_).

I query and I get:

?- process(Xs).
 Xs = "CCCCCCCCCC" ;
 Xs = "CCCCCCCCCCTTGGGGGGGGGGGGG...CCCCC" ;
 Xs = "CCCCCCCCCC" ;
false.

But I dont want it to find the second solution or ones like it. Only the solutions between one pair of AA and TTs not all combinations. I have a feeling I could use string_without and string in library dcg basiscs but I dont understand how to use them.


Solution

  • your anyseq//1 is identical to string//1 from library(dcg/basics), and shares the same 'problem'.

    To keep in control, I would introduce a 'between separators' state:

    elem(E) --> begin, string(E), end, !.
    
    begin --> "AA".
    end -->"TT".
    
    find(Seq) -->
         anyseq(_),elem(Seq).
    
    anyseq([]) -->[].
    anyseq([E|Es]) --> [E], anyseq(Es).
    
    process(Xs) :-
       phrase(find(Xs), `GGGGGGGGAACCCCCCCCCCTTGGGGGGGGGGGGGGGGAACCCCC+++CCCCCTTGGGGGGGG`,_).
    

    now I get

    ?- process(X).
    X = "CCCCCCCCCC" ;
    X = "CCCCC+++CCCCC" ;
    false.
    

    note the anonymous var as last argument of phrase/3: it's needed to suit the change in 'control flow' induced by the more strict pattern used: elem//1 is not followed by anyseq//1, because any two sequences 'sharing' anyseq//1 would be problematic.

    In the end, you should change your grammar to collect elem//1 with a right recursive grammar....