Search code examples
pythonregexre2

How to get all possible interpretations in regex match?


If "Who acted as (?P<role>.*) in (?P<movie>.*)" is the template I want to match for queries like "Who acted as tony montana in Scarface".

If the role name has a "in" here or If the movie name has an "in", the regex match will go wrong.

Eg: "Who acted as k in men in black" will give "k in men" as role.

May be a non greedy approach will work for this query but it will go for a toss if the movie contains the word "in". How do I get all possible interpretations here?


Solution

  • Given a phrase like 'a in b in c in d' this will generate all possible partitions by the word in:

    words = phrase.split()
    
    for n, w in enumerate(words):
        if w == 'in':
            print '(%s) in (%s) ' % (
                ' '.join(words[:n]),
                ' '.join(words[n+1:]))
    

    For your specific problem, if there are three ins in the phrase, the "middle" interpretation ((a in b) in (c in d)) would be most probably correct, but with two ins there's no way to solve this by the means of text manipulations, because "left" and "right" partitions are equally probable, consider:

    Who acted as jeebs in men in black
    Who acted as woman in red in matrix
    

    You'll have to use NLP or database-driven methods to parse this correctly.