Search code examples
pythonstringsubstringrepeatpartition

How to get surrounding words of substring in string, if the substring repeats itself?


I have a task where I need to fetch N words before and after every substring (could be multiple words) in a string. I initially considered using str.split(" ") and work with the list but the issue is I'm fetching a substring which can be multiple words.

I've tried using str.partition and its very close to doing exactly what I want but it only gets the first keyword.

Code:

text = "Hello World how are you doing Hello is the keyword I'm trying to get Hello is a repeating word"
part = text.partition("Hello")
part = list(map(str.strip, part))

Output:

['', 'Hello', "World how are you doing Hello is the keyword I'm trying to get Hello is a repeating word"]

This gets me exactly what I need for the first keyword. I have enough to then get the prior and posterior words. Unfortunately, this fails me when the substring I'm looking for is repeating.

If the output could instead be a list of list partitions then I could actually make it work. How should I approach this?


Solution

  • text = "Hello World how are you doing Hello is the keyword I'm trying to get Hello is a repeating word"
    
    def recursive_partition(text, pattern):
      if not text:
        return text
      tmp = text.partition(pattern)
      if tmp and tmp[1]:
        return [tmp[0]] + [tmp[1]] + recursive_partition(tmp[2], pattern)
      else:
        return [tmp[0]]
    
    res = recursive_partition(text, "Hello")
    print(res)  # ['', 'Hello', ' World how are you doing ', 'Hello', " is the keyword I'm trying to get ", 'Hello', ' is a repeating word']