Search code examples
pythonpython-3.xrecursionpython-itertoolscartesian-product

How to break early from a cartesian product recursive function once a solution is found?


I am analyzing the phonetic composition of words, and as part of this I have been using cartesian products to match spelling permutations with a given word. Each sound in a word can be represented by several spellings, and the program determines the correct spelling for each sound in a word. There are an unknown number of lists, of unknown length.

I am currently user itertools' product() inside of a list comprehension, i.e. brute-forced, every permutation checked before returning a value. Here is the relevant part in Python 3:

from itertools import product


def cartesian_match(string, iterables):
    """Gets the phonetic spelling breakdown of a word via cartesian product.

    Args:
        string (str):     String for which a matched spelling is wanted.
        iterables (list): A list of lists of unknown number and length.
                          Each sublist contains only str elements.
                          Each sublist contains all possible spellings of a
                          phoneme.

    Returns:
        list: the first matched list of spelling units.

    Example (simplified):
      Args:
        string = "python"
        iterables = [
          'p', 'pp'],['i', 'ie', 'y', 'igh'],['th'],['or', 'ou', 'e', 'o'],[
          'nd', 'nn', 'n', 'ne']

      Returns:
        ['p', 'y', 'th', 'o', 'n']

    """
    return [x for x in product(*iterables) if "".join(x) == string][0]

For complex words, the cartesian product is large, tens of millions of permutations. Some words take upwards of 15 minutes to compute. I have thousands of words to analyze so speed is currently an issue.

To speed things up, I need a function which returns the value as soon as it is discovered, rather than forming a cartesian product and having to run through each and every permutation. It would also allow me to optimize the sequence of elements inside each sub-list in order to get the matched value sooner.

My challenge is that I cannot figure out how to do this iteratively with an unknown number of lists of unknown length, and I've failed at any attempt to break out of a recursive function early.

Can anybody point me in the right direction?


Solution

  • for x in in product(*iterables):
        if "".join(x) == string:
            return x
    

    BTW: your function is not recursive - the title of this question is misleading.