Search code examples
pythonpython-3.xpandaspython-re

Pandas extract between multiple Start words and multiple stop words


Following on from Pandas DataFrame extract between one START word and multiple STOP words, is it possible to extend the solution to multiple start words, too? Example shouldn't be taken very literally:

df 
   
0   start_word1  text1 end_word1
1   start_word2  text2 end_word2

Expected output

df 
   
0   text1 
1   text2 


Solution

  • You can use non-capturing groups to define the start/stop words alternatives:

    df['COLUMN_NAME'].str.extract('(?:start_word1|start_word2)\s+(.*)\s+(?:end_word1|end_word2)')