Search code examples
pythontext

Python: how to read text file then count specific word and print date,time from another line when found specific word


how to read text file then count specific word and print date,time from another line when found specific word

for example text file

2023-09-15T16:55:10Z word word
word word
word **specific word** word word
word word 
word word
2023-09-15T16:56:20Z word word
word word
word word
word word
2023-09-15T16:57:30Z word word
word word
word **specific word** word word

Result i want

Number of occurrences of the word : 2
2023-09-15T16:55:10
2023-09-15T16:57:30

Now my code can count number but print all line

f = open('file_name.txt', 'r')
occurrences = data.count("specific word")
print('Number of occurrences of the word :', occurrences)
for line in f:
    f = line[:19]
    print(f)

and result look like this:

Number of occurrences of the word : 2
2023-09-15T16:55:10
word word
word **specific word**
word word

word word
word word

2023-09-15T16:56:20
word word
word word

2023-09-15T16:57:30
word word
word **specific word**

thank so much


Solution

  • One way to produce your desired result is to use regular expressions.

    First, we split your text into chunks whenever a date is found.

    Next, we search through the list of chunks for text matching **specific word**. When it is found, we record the preceding date.

    import re
    
    with open('file_name.txt', 'r') as f:
        txt = f.read()
    
    re_date = r'(\d{4}\-\d{2}\-\d{2}T\d{2}\:\d{2}\:\d{2}Z)'
    
    txt_split = re.split(re_date, txt)
    
    results = []
    for i, t in enumerate(txt_split):
        if re.search(r'\*\*specific word\*\*', t) and i > 0:
            results.append(txt_split[i-1])
    
    print(f'Number of occurrences of the word : {len(results)}')
    print('\n'.join(results))
    

    The above code produces the desired result:

    Number of occurrences of the word : 2
    2023-09-15T16:55:10Z
    2023-09-15T16:57:30Z
    

    Appendix

    In this example, I used an oversimplified regex expression to find a date in yyyy-MM-dd'T'HH:mm:ssZ format. In production, you may want to ensure that it begins with digits 19 or 20 and ensure that months and days are limited to 01-12 and 01-31, respectively, and there may be additional constraints.