how to read text file then count specific word and print date,time from another line when found specific word
for example text file
2023-09-15T16:55:10Z word word
word word
word **specific word** word word
word word
word word
2023-09-15T16:56:20Z word word
word word
word word
word word
2023-09-15T16:57:30Z word word
word word
word **specific word** word word
Result i want
Number of occurrences of the word : 2
2023-09-15T16:55:10
2023-09-15T16:57:30
Now my code can count number but print all line
f = open('file_name.txt', 'r')
occurrences = data.count("specific word")
print('Number of occurrences of the word :', occurrences)
for line in f:
f = line[:19]
print(f)
and result look like this:
Number of occurrences of the word : 2
2023-09-15T16:55:10
word word
word **specific word**
word word
word word
word word
2023-09-15T16:56:20
word word
word word
2023-09-15T16:57:30
word word
word **specific word**
thank so much
One way to produce your desired result is to use regular expressions.
First, we split your text into chunks whenever a date is found.
Next, we search through the list of chunks for text matching **specific word**. When it is found, we record the preceding date.
import re
with open('file_name.txt', 'r') as f:
txt = f.read()
re_date = r'(\d{4}\-\d{2}\-\d{2}T\d{2}\:\d{2}\:\d{2}Z)'
txt_split = re.split(re_date, txt)
results = []
for i, t in enumerate(txt_split):
if re.search(r'\*\*specific word\*\*', t) and i > 0:
results.append(txt_split[i-1])
print(f'Number of occurrences of the word : {len(results)}')
print('\n'.join(results))
The above code produces the desired result:
Number of occurrences of the word : 2
2023-09-15T16:55:10Z
2023-09-15T16:57:30Z
In this example, I used an oversimplified regex expression to find a date in yyyy-MM-dd'T'HH:mm:ssZ format. In production, you may want to ensure that it begins with digits 19 or 20 and ensure that months and days are limited to 01-12 and 01-31, respectively, and there may be additional constraints.