I have a text file containing a sequence. For example:
GGGGGGGGAACCCCCCCCCCTTGGGGGGGGGGGGGGGGAACCCCCCCCCCTTGGGGGGGG
I have wrote the following DCG to find the sequence between AA and TT.
:- use_module(library(pio)).
:- use_module(library(dcg/basics)).
:- portray_text(true).
process(Xs) :- phrase_from_file(find(Xs), 'string.txt').
anyseq([]) -->[].
anyseq([E|Es]) --> [E], anyseq(Es).
begin --> "AA".
end -->"TT".
find(Seq) -->
anyseq(_),begin,anyseq(Seq),end, anyseq(_).
I query and I get:
?- process(Xs).
Xs = "CCCCCCCCCC" ;
Xs = "CCCCCCCCCCTTGGGGGGGGGGGGG...CCCCC" ;
Xs = "CCCCCCCCCC" ;
false.
But I dont want it to find the second solution or ones like it. Only the solutions between one pair of AA and TTs not all combinations. I have a feeling I could use string_without
and string
in library dcg basiscs but I dont understand how to use them.
your anyseq//1 is identical to string//1 from library(dcg/basics), and shares the same 'problem'.
To keep in control, I would introduce a 'between separators' state:
elem(E) --> begin, string(E), end, !.
begin --> "AA".
end -->"TT".
find(Seq) -->
anyseq(_),elem(Seq).
anyseq([]) -->[].
anyseq([E|Es]) --> [E], anyseq(Es).
process(Xs) :-
phrase(find(Xs), `GGGGGGGGAACCCCCCCCCCTTGGGGGGGGGGGGGGGGAACCCCC+++CCCCCTTGGGGGGGG`,_).
now I get
?- process(X).
X = "CCCCCCCCCC" ;
X = "CCCCC+++CCCCC" ;
false.
note the anonymous var as last argument of phrase/3: it's needed to suit the change in 'control flow' induced by the more strict pattern used: elem//1 is not followed by anyseq//1, because any two sequences 'sharing' anyseq//1 would be problematic.
In the end, you should change your grammar to collect elem//1 with a right recursive grammar....