performance optimization prolog stack-overflow

How to count coincidences on each character of two large strings without triggering the Out of Local Stack exception?

I need a clause that counts char coincidences between two large strings but omitting '_' coincidences. I have this code:

fit(GEN1, GEN2, N, N) :-
   length(GEN1, L1),
   length(GEN2, L2),
   0 is L1*L2.

fit([P1|R1], [P2|R2], N, TOTAL) :-
   member(P1, ['_',a,c,t,g]),
   member(P2, ['_',a,c,t,g]),
   append([P1],[P2],T),
   (  member(T,[[a,a],[c,c],[t,t],[g,g]])
   -> X is N+1
   ;  X is N
   ),
   fit(R1,R2,X,TOTAL).

Where GEN1 and GEN2 are lists containing all characters large strings.

I've tried increasing the stack limit to avoid Out of Local Stack exception with little success.

The issue is that, is called often and in deep recursive clauses. Is there any better way to do this?

EDIT

The clause needs to stop when one or both lists are empty.

EDIT 2

Is worth saying that testings on all answers below were done using 64bit prolog, with the --stack-limit=32g option as my code isn't well optimized and the fit clause is a small part of a larger process, but was the main problem with my code.

EDIT 3

CapelliC code worked using the less resources.

false code using the library(reif) v2 worked the faster.

See Complexity of counting matching elements in two sequences using library(aggregate) for more proposed solutions.

Solution

if your Prolog has library(aggregate) you can do

fit(GEN1, GEN2, N) :-
  aggregate_all(count, (nth1(P,GEN1,S),nth1(P,GEN2,S),memberchk(S,[a,c,g,t])), N).

edit

Depending on the statistic of data, a noticeable improvement can be obtained just swapping the last two calls, i.e. ...(nth1(P,GEN1,S),memberchk(S,[a,c,g,t]),nth1(P,GEN2,S))...

edit

Of course a tight loop it's better that a double indexed scan. For performance, I would write it like

fit_cc(GEN1, GEN2, N) :-
  fit_cc(GEN1, GEN2, 0, N).

fit_cc([X|GEN1], [Y|GEN2], C, N) :-
  (  X\='_' /*memberchk(X, [a,c,g,t])*/, X=Y
  -> D is C+1 ; D=C
  ),
  fit_cc(GEN1, GEN2, D, N).
fit_cc(_, _, N, N).

but the generality and correctness allowed by library(reif) v2, as seen in @false' answer and comments, seems to be well worth the (pretty small) overhead.