Cutting a string based on the start keyword and end key word of the string python

I have a pdf which I have read via Tika package in python. It seems tika can only read a whole pdf and i need to read only the first page.

My code looks like:

from tika import parser
raw = parser.from_file(pdfname)
rawtext = raw['content']

I would like to split the rawtext by start keyword and end keyword. How do I do that?

Solution

You can use a regex to select the text that you are interested, for example:

import re


raw_text = 'this is a sample of text'
start = 'is'
end = 'of'

start_index = re.search(r'\b' + start + r'\b', raw_text).start()
end_index = re.search(r'\b' + end + r'\b', raw_text).end()
section_of_text = raw_text[start_index:end_index]
print(section_of_text)

>>> "is a sample of"