Search code examples
pythonstringlongest-substring

Efficiently finding the longest matching prefix string


My current implementation is this:

def find_longest_matching_option(option, options):
    options = sorted(options, key=len)
    longest_matching_option = None
    for valid_option in options:
        # Don't want to treat "oreo" as matching "o",
        # match only if it's "o reo"
        if re.match(ur"^{}\s+".format(valid_option), option.strip()):
            longest_matching_option = valid_option
    return longest_matching_option

Some examples of what I'm trying to do:

"foo bar baz something", ["foo", "foo bar", "foo bar baz"]
# -> "foo bar baz"
"foo bar bazsomething", (same as above)
# -> "foo bar"
"hello world", ["hello", "something_else"]
# -> "hello"
"a b", ["a", "a b"]
# -> "a b" # Doesn't work in current impl.

Mostly, I'm looking for efficiency here. The current implementation works, but I've been told it's O(m^2 * n), which is pretty bad.

Thanks in advance!


Solution

  • Let's start with foo.

    def foo(x, y):
        x, y = x.strip(), y.strip()
        return x == y or x.startswith(y + " ")
    

    foo returns true either if two strings are equal, or one (plus a space) is a substring of the other.

    Next, given a case string, and a list of options, you can use filter to find all valid substrings for the given case string, and then apply max to find the longest one (see tests below).

    Here's a few test cases for foo. For the purpose of demonstrating, I'll use partial to curry foo to a higher order function.

    from functools import partial
    
    cases = ["foo bar baz something", "foo bar bazsomething", "hello world", "a b", "a b"]
    options = [
          ["foo", "foo bar", "foo bar baz"], 
          ["foo", "foo bar", "foo bar baz"],
          ["hello", "something_else"],
          ["a", "a b"],
          ["a", "a b\t"]
    ]
    p_list = [partial(foo, c) for c in cases]
    
    for p, o in zip(p_list, options):
        print(max(filter(p, o), key=len))
    

    foo bar baz
    foo bar
    hello
    a b
    a b