Search code examples
prolog

How to generate all words of length N from an alphabet with condition


Generate all words of length N from an alphabet {a,b,c,d,e,f} where 4 letters appear twice and all the other letters appear once or not appearing at all.

I found this code that generates words from an alphabet

letter(X) :- member(X, [a, b, c, d, e, f]).

word(0, []).
word(N, [C|W]) :-
    N > 0,
    N1 is N-1,
    letter(C),
    word(N1, W).

words(N, L) :-
    findall(W, word(N, W), Ws),
    maplist(atomic_list_concat, Ws, L).

I desided to go throw all of them and pick the ones that match my conditions with counting occurences of letters

count_occurrences(List, Occ):-
    findall([X,L], (bagof(true,member(X,List),Xs), length(Xs,L)), Occ).

but it's taking too much time, what other solutions can be found?


Solution

  • I tried to make this solution as efficient as I could, at least for a small evening puzzle. Suggestions for improvement are welcome.

    Generate all words of length N from an alphabet {a,b,c,d,e,f} where 4 letters appear twice and all the other letters appear once or not appearing at all.

    My reading is that this implies that any solution will have a length of 8 to 10: 4 * 2 characters must appear, and there are up to two other optional characters. My solution is structured as first constructing a "base word" consisting of those optional characters. Into this base word we then insert two copies each of four other characters.

    So, this double insertion first:

    insert2(Xs, Y) -->
        [Y],
        insert(Xs, Y).
    insert2([X | Xs], Y) -->
        [X],
        insert2(Xs, Y).
    
    insert(Xs, Y) -->
        [Y],
        list(Xs).
    insert([X | Xs], Y) -->
        [X],
        insert(Xs, Y).
    
    list([]) -->
        [].
    list([X | Xs]) -->
        [X],
        list(Xs).
    

    For example:

    ?- phrase(insert2([b], a), List).
    List = [a, a, b] ;
    List = [a, b, a] ;
    List = [b, a, a].
    

    Note that this is somewhat carefully written to avoid leaving a choice point at the end (in SWI-Prolog). More importantly, it is also written to avoid duplicate solutions; if you implement double insertion as "insert into a list, then insert into that result", you will get duplicates, and in an exponential algorithm those will hurt.

    With that, base words are:

    baseword([]).
    baseword([_X]).
    baseword([_X, _Y]).
    

    and the predicate describing the structure of all possible solutions is:

    puzzle_(CharacterPlaceholders) :-
        baseword(BaseWord),
        phrase(insert2(BaseWord, _A), WordWith2A),
        phrase(insert2(WordWith2A, _B), WordWith2A2B),
        phrase(insert2(WordWith2A2B, _C), WordWith2A2B2C),
        phrase(insert2(WordWith2A2B2C, _D), WordWith2A2B2C2D),
        CharacterPlaceholders = WordWith2A2B2C2D.
    

    For example:

    ?- puzzle_(List).
    List = [_2278, _2278, _2230, _2230, _2194, _2194, _2170, _2170] ;
    List = [_2278, _2230, _2278, _2230, _2194, _2194, _2170, _2170] ;
    List = [_2278, _2230, _2230, _2278, _2194, _2194, _2170, _2170] ;
    List = [_2278, _2230, _2230, _2194, _2278, _2194, _2170, _2170] ;
    List = [_2278, _2230, _2230, _2194, _2194, _2278, _2170, _2170] ;
    List = [_2278, _2230, _2230, _2194, _2194, _2170, _2278, _2170] .
    

    How long does it take to enumerate all variants?

    ?- time((puzzle_(CharacterPlaceholders), false)).
    % 845,559 inferences, 0.040 CPU in 0.040 seconds (100% CPU, 21324130 Lips)
    false.
    

    Seems reasonable.

    Given these structures, it remains to fill in the placeholders. This consists of assigning letters to the variables in the list (only assigning every letter once, but if the variable appears twice in the list, this will make the letter appear twice as well):

    label(CharacterPlaceholders) :-
        Letters = [a, b, c, d, e, f],
        label(CharacterPlaceholders, Letters).
    
    label([], _Letters).
    label([Var | MaybeVars], Letters) :-
        (   var(Var)
        ->  select(Var, Letters, RemainingLetters),
            label(MaybeVars, RemainingLetters)
        ;   label(MaybeVars, Letters) ).
    

    For example:

    ?- puzzle_(List), label(List).
    List = [a, a, b, b, c, c, d, d] ;
    List = [a, a, b, b, c, c, e, e] ;
    List = [a, a, b, b, c, c, f, f] ;
    List = [a, a, b, b, d, d, c, c] ;
    List = [a, a, b, b, d, d, e, e] ;
    List = [a, a, b, b, d, d, f, f] ;
    List = [a, a, b, b, e, e, c, c] ;
    List = [a, a, b, b, e, e, d, d] .
    

    What's the cost of enumerating all solutions of lengths 8, 9, and 10, respectively?

    ?- between(8, 10, Length), length(List, Length), time((puzzle_(List), label(List), false)).
    % 7,306,838 inferences, 0.334 CPU in 0.334 seconds (100% CPU, 21869592 Lips)
    % 128,616,758 inferences, 6.052 CPU in 6.052 seconds (100% CPU, 21252001 Lips)
    % 918,924,038 inferences, 46.998 CPU in 46.999 seconds (100% CPU, 19552521 Lips)
    false.