haskell recursion functional-programming predicate backtracking

Haskell backtracking

Two friends P1 and P2 send the SAME message M to a mutual friend say P3.

However due to some network damage P3 only receives only one character at a time without knowing if the character received belongs to P1 or P2.

Furthermore P3 might receive X characters from P1 then Y characters from P2 or vice versa but whatever the order P3 will receive ALL characters that both P1 and P2 sent.

Given the sequence S of characters that P3 received help him determine the initial message M that consists only of 0s and 1s

Note that there might be more than one solutions to the problem however getting just one is fine.

Examples :

1) S = [0,1,0,0,1,0] then M = "010"

2) S = [0,0,1,1,0,0,1,1,0,0] then M = "01010" or M = "00110"

To clarify the order and the ownership of each character :

Say M = "cat" then S might be :
    
    1) [c1,c2,a2,t2,a1,t1]

    2) [c1,a1,t1,c2,a2,t2]
    
    3) [c1,c2,a1,a2,t2,t1]

Where xi stands for : Character x belongs to person i.

Given the fact that P1 and P2 send the same message then :

There is a fixed amount of 0s that P1 and P2 can send
There is also a fixed amount of 1s that P1 and P2 can send
Length of M will obviously be an even number

At first I implemented the predicate above using Prolog and A's (0) and B's (1) where backtracking is fairly easy and I applied a constraint that prunes my search tree so that my approach is not a brute force one :

Prolog Code :

countCharacters([],A,B,A,B).

countCharacters([C|T],A,B,X,Y) :-                           % Count A's per person and B's per person
       (C == a -> A1 is A + 1,countCharacters(T,A1,B,X,Y);
        B1 is B + 1,countCharacters(T,A,B1,X,Y)).

countCharacters(L,A,B) :-
    countCharacters(L,0,0,X,Y),
    A is X / 2,
    B is Y / 2.

rightOrder([],_) :- !.

rightOrder(_,[]) :- !.

rightOrder([C1|_],[C2|_]) :- C1 \= C2,!,false.

rightOrder([C|T1],[C|T2]) :-                   % Constraint that checks if two lists have the same order
        rightOrder(T1,T2).

determine([],M1,M2,_,_,_,_,M1) :- M1 == M2,!.

determine(L,M1,M2,A1,B1,A2,B2,X) :-
            A1 == 0,
            B1 == 0,
            append(M2,L,NM2),
            rightOrder(M1,NM2),
            determine([],M1,NM2,A1,B1,A2,B2,X).

determine([a|T],M1,M2,A1,B1,A2,B2,X) :-
            A1 > 0,
            NA1 is A1 - 1,
            append(M1,[a],NM1),
            determine(T,NM1,M2,NA1,B1,A2,B2,X).

determine([b|T],M1,M2,A1,B1,A2,B2,X) :-
            B1 > 0,
            NB1 is B1 - 1,
            append(M1,[b],NM1),
            determine(T,NM1,M2,A1,NB1,A2,B2,X).

determine([a|T],M1,M2,A1,B1,A2,B2,X) :-
            A2 > 0,
            NA2 is A2 - 1,
            append(M2,[a],NM2),
            rightOrder(M1,NM2),
            determine(T,M1,NM2,A1,B1,NA2,B2,X).

determine([b|T],M1,M2,A1,B1,A2,B2,X) :-
            B2 > 0,
            NB2 is B2 - 1,
            append(M2,[b],NM2),
            rightOrder(M1,NM2),
            determine(T,M1,NM2,A1,B1,A2,NB2,X).

determine(L,M) :-
    countCharacters(L,AS,BS),
    determine(L,[],[],AS,BS,AS,BS,M).

The code above is not that optimized as I've been studying Prolog for just a few weeks now, however I need some help or insight on how to implement the same predicate in Haskell as I have no clue on how to backtrack.

If you need more clarifications let me know.

Solution

An inefficient way to do this in Haskell would be with the list monad, which simulates nondeterminism.

One way to arrive at a solution is to consider the problem from the opposite direction: how would you generate the possible ways the message could have been interleaved? Essentially for every element in the output, there will have been a choice between taking it from one sender or the other, or all the remaining elements will come from the same sender if one has run out of elements. Expressed literally:

-- Compute all the possible interleavings of a list with itself.
interleavings :: [a] -> [[a]]
interleavings xs0 = go xs0 xs0
  where

    -- If the first list has run out,
    -- return the remainder of the second.
    go [] rs = pure rs

    -- And vice versa.
    go ls [] = pure ls

    -- If both lists are nonempty:
    go ls@(l : ls') rs@(r : rs') = do

      -- Toss a coin;
      choice <- [False, True]

      case choice of

        -- If tails, take an element from the left sender
        -- and prepend it to all possible remaining interleavings.
        False -> fmap (l :) (go ls' rs)

        -- If heads, take from the right sender.
        True -> fmap (r :) (go ls rs')

Note that this generates many duplicate entries, since it doesn’t backtrack or prune:

> interleavings "10"
["1010","1100","1100","1100","1100","1010"]

However, it does point the way to the start of a solution. You want to run the above process in reverse: given an interleaving, generate a series of choices and assume that each element came from the assumed list, keeping track of the deinterleaved lists. If they’re equal at the end, then they represent a valid deinterleaving:

-- The possible deinterleavings of a list
-- whose elements can be compared for equality.
deinterleavings :: (Eq a) => [a] -> [[a]]

-- Begin searching assuming no elements have been sent by either sender.
deinterleavings xs0 = go [] [] xs0
  where

    -- If there is an element remaining:
    go ls rs (x : xs) = do

      -- Toss a coin;
      choice <- [False, True]

      case choice of

        -- If tails, assume it came from the left sender and proceed.
        -- (Note that this accumulates in reverse, adding to the head.)
        False -> go (x : ls) rs xs

        -- If heads, assume the right sender.
        True -> go ls (x : rs) xs

    -- If there are no elements remaining:
    go ls rs [] = do

      -- Require that the accumulated messages be identical.
      guard (ls == rs)

      -- Return the (de-reversed) message.
      pure (reverse ls)

Again this is extremely inefficient:

> deinterleavings "0011001100"
["00110","00110","01100","01010","01010","01010","01010","01010","01010","01010","01010","01010","01010","01010","01010","01010","01010","01010","01010","01100","01100","01010","01010","01010","01010","01010","01010","01010","01010","01010","01010","01010","01010","01010","01010","01010","01010","01100","00110","00110"]

But I hope it illustrates the general structure of a solution that you can improve upon.

Consider how you could introduce guards earlier, or accumulate elements differently to prune the search; or use a different monad that does backtracking like Logic; or maintain a stateful set of results with State (or even IO) so that you can check during the computation which results you’ve already seen. Also consider how you could approach the problem from another angle entirely, based on the fact that the interleaved message contains the same string twice as subsequences, since there are standard efficient memoised algorithms for the “longest common subsequence” and “longest repeating subsequence”.