Search code examples
pythongeneratorlazy-sequenceslexicographic

Generate strings in lexicographical order in Python


How can I write a Python generator that lazily generates all strings composed of lowercase English letters of up to a certain length1?

I have written my own solution (posted below as an answer), but I would like to see if there are any more elegant/efficient/fun solutions out there.


1 An infinite iterator would be pretty useless because it would just generate strings composed of only the character a. This is because the lexicographical ordering of strings is not a well-order; it can be thought of as composed of an infinite sequence of infinitely nested sequences: (a, (aa, ...), (ab, ...), ...), (b, (ba, ...), (bb, ...), ...), ... The generator would never reach ab since it has an infinite amount of predecessors.


Solution

  • Here is my solution:

    import string
    
    
    def lexstrings(max_length: int, alphabet=string.ascii_lowercase):
        yield ""
        if max_length == 0: return
        for first in alphabet:
            for suffix in lexstrings(max_length - 1, alphabet=alphabet):
                yield first + suffix
    

    Example:

    >>> g = lexstrings(max_length=3, alphabet="ab")
    >>> list(g)
    ['',
     'a',
     'aa',
     'aaa',
     'aab',
     'ab',
     'aba',
     'abb',
     'b',
     'ba',
     'baa',
     'bab',
     'bb',
     'bba',
     'bbb']
    

    This might not be the best solution because it involves recursion and using the + operator m times to generate a string of length m, which isn't efficient because Python generates copies of the intermediate results (since strings are immutable).

    This implementation also "supports" the infinite version:

    >>> g = lexstrings(-1)
    >>> next(g)
    ''
    >>> next(g)
    'a'
    >>> next(g)
    'aa'
    >>> next(g)
    'aaa'
    ...