Recently I have an ongoing research project that requires me to only keep paragraphs containing keywords of each txt file. Does there have any way to do that?
keywords=["cryptocurren","virtual curren","digital curren"]
txt sample
The widespread adoption of new technologies, including internet services, cryptocurrencies and payment systems, could require substantial expenditures to modify or adapt our existing products and services as we grow and develop our internet banking and mobile banking channel strategies in addition to remote connectivity solutions.
A significant natural disaster, such as a tornado, hurricane, earthquake, fire or flood, could have a material adverse impact on our ability to conduct business, and our insurance coverage may be insufficient to compensate for losses that may occur. Acts of terrorism, war, civil unrest, or pandemics could cause disruptions to our business or the economy as a whole. While we have established and regularly test disaster recovery procedures, the occurrence of any such event could have a material adverse effect on our business, operations and financial condition.
As the text showed above, only the first paragraph contains the keyword of the keyword list. Thus, I only want the txt file contain the 1st paragraph.
Thank you in advance!
I hope to find a way to only keep paragraphs that contain the keywords of the txt file.
You have to figure out the paragraphs and than search the keyword. I used regex:
import re
data = """The widespread adoption of new technologies, including internet
services, cryptocurrencies and payment systems, could require
substantial expenditures to modify or adapt our existing products
and services as we grow and develop our internet banking and
mobile banking channel strategies in addition to remote
connectivity solutions.
A significant natural disaster, such as a tornado, hurricane,
earthquake, fire or flood, could have a material adverse impact on
our ability to conduct business, and our insurance coverage may
be insufficient to compensate for losses that may occur. Acts of
terrorism, war, civil unrest, or pandemics could cause disruptions
to our business or the economy as a whole. While we have
established and regularly test disaster recovery procedures, the
occurrence of any such event could have a material adverse effect
on our business, operations and financial condition."""
keywords=["cryptocurren","virtual curren","digital curren"]
# keywords = ["insurance"]
for match in re.finditer(r'(?s)((?:[^\n][\n]?)+)', data):
print(match.start(), match.end())
start = match.start()
end = match.end()
step = 1
if [word for word in keywords if word in data[start:end:step]]:
print(data[start:end:step])
Output:
0 334
The widespread adoption of new technologies, including internet
services, cryptocurrencies and payment systems, could require
substantial expenditures to modify or adapt our existing products
and services as we grow and develop our internet banking and
mobile banking channel strategies in addition to remote
connectivity solutions.
335 905