python-3.x pandas datetime python-re python-dateutil

Python Read data from txt file using find all method for timestamps

I am reading data from txt file. It is listing the data between #### and ####. If there is same timestamps last two lines, workflow select the first one and separating it. When I print data in console(Print data:), you can see that there is a gap in ZZZZZZ information. Three timestamps are equal,it detects first one and adding a separation. Therefore, it is not outputting the correct name in csv file. How can I read data correctly? It should always look at the dash (#####).

    Input.txt file:
    2020-08-28T11:46:24.8419656Z ################################################################################
    2020-08-28T11:46:24.8419656Z XXXXXX
    2020-08-28T11:46:39.9397372Z Execution 0
    2020-08-28T11:46:39.9417366Z Creation 0
    2020-08-28T11:46:41.4877509Z Build 0
    2020-08-28T11:48:02.6957708Z Level 0 
    2020-08-28T11:48:02.7227683Z Converting file start
    2020-08-28T11:48:11.7408315Z Converting done 0
    2020-08-28T11:48:11.8148285Z Checking results
    2020-08-28T11:48:11.8418281Z Test Status XXXXXX: Success
    2020-08-28T11:48:11.8498273Z ################################################################################
    2020-08-28T11:48:11.8498273Z YYYYYY
    2020-08-28T11:48:27.1533026Z Execution 0
    2020-08-28T11:48:27.1583035Z Creation 0
    2020-08-28T11:48:28.6763028Z Build 0
    2020-08-28T11:49:31.9180832Z Level 0 
    2020-08-28T11:49:31.9440848Z ##[error]
    2020-08-28T11:49:31.9530839Z ################################################################################
    2020-08-28T11:50:24.8419656Z ZZZZZZ
    2020-08-28T11:50:39.9397372Z Execution 0
    2020-08-28T11:50:39.9417366Z Creation 0
    2020-08-28T11:50:41.4877509Z Build 0
    2020-08-28T11:51:02.6957708Z Level 0 
    2020-08-28T11:51:02.7227683Z Converting file start
    2020-08-28T11:51:11.7408315Z Converting done 0
    2020-08-28T11:51:11.8418281Z Checking results
    2020-08-28T11:51:11.8418281Z Test Status ZZZZZZ: Success
    2020-08-28T11:51:11.8418281Z ################################################################################
    2020-08-28T11:53:24.8419656Z DDDDDD
    2020-08-28T11:53:39.9397372Z Execution 0
    2020-08-28T11:53:39.9417366Z Creation 0
    2020-08-28T11:53:41.4877509Z Build 0
    2020-08-28T11:53:02.6957708Z Level 0 
    2020-08-28T11:54:02.7227683Z Converting file start
    2020-08-28T11:54:11.7408315Z Converting done 0
    2020-08-28T11:54:11.8148285Z Checking results
    2020-08-28T11:54:11.8418281Z Test Status DDDDDD: Success
    2020-08-28T11:54:11.8498273Z ################################################################################

    Code:

    import re
    from dateutil import parser
    import pandas as pd
    
    with open('1_Build.txt') as file:
        data = file.read()
    
    timestamps = re.findall(r'(\d{4}-\d{2}-\d{2}T\d{2}:\d{2}.+Z)\s#{3,}', data)
    text = []
    dict_list = []
    for i in range(len(timestamps)-1):
        text.append(data[data.index(timestamps[i]):data.index(timestamps[i+1])])
        time_diff = parser.isoparse(timestamps[i+1]) - parser.isoparse(timestamps[i])
        print(text[-1])
        lines = text[-1].split('\n')
        dict = {}
        dict['name'] = lines[1].split(' ')[1]
        dict['execution'] = (parser.isoparse(lines[3].split(' ')[0]) - parser.isoparse(lines[2].split(' ')[0])).seconds
        dict['creation'] = (parser.isoparse(lines[4].split(' ')[0]) - parser.isoparse(lines[3].split(' ')[0])).seconds
        dict['build'] = (parser.isoparse(lines[5].split(' ')[0]) - parser.isoparse(lines[4].split(' ')[0])).seconds
        dict['level'] = (parser.isoparse(lines[6].split(' ')[0]) - parser.isoparse(lines[5].split(' ')[0])).seconds
        if "error" in lines[-2]:
            dict['test_status'] = 1
            dict_list.append(dict)
            continue
        elif "Success" in lines[-2]:
            dict['test_status'] = 0
            dict['converting'] = (parser.isoparse(lines[7].split(' ')[0]) - parser.isoparse(lines[6].split(' ')[0])).seconds
            dict['checking'] = (parser.isoparse(lines[8].split(' ')[0]) - parser.isoparse(lines[7].split(' ')[0])).seconds
        dict_list.append(dict)
    
    
    df = pd.DataFrame(dict_list)
    df.to_csv('output.csv')


Print data:
2020-08-28T11:46:24.8419656Z ################################################################################
2020-08-28T11:46:24.8419656Z XXXXXX
2020-08-28T11:46:39.9397372Z Execution 0
2020-08-28T11:46:39.9417366Z Creation 0
2020-08-28T11:46:41.4877509Z Build 0
2020-08-28T11:48:02.6957708Z Level 0 
2020-08-28T11:48:02.7227683Z Converting file start
2020-08-28T11:48:11.7408315Z Converting done 0
2020-08-28T11:48:11.8148285Z Checking results
2020-08-28T11:48:11.8418281Z Test Status XXXXXX: Success

2020-08-28T11:48:11.8498273Z ################################################################################
2020-08-28T11:48:11.8498273Z YYYYYY
2020-08-28T11:48:27.1533026Z Execution 0
2020-08-28T11:48:27.1583035Z Creation 0
2020-08-28T11:48:28.6763028Z Build 0
2020-08-28T11:49:31.9180832Z Level 0 
2020-08-28T11:49:31.9440848Z ##[error]

2020-08-28T11:49:31.9530839Z ################################################################################
2020-08-28T11:50:24.8419656Z ZZZZZZ
2020-08-28T11:50:39.9397372Z Execution 0
2020-08-28T11:50:39.9417366Z Creation 0
2020-08-28T11:50:41.4877509Z Build 0
2020-08-28T11:51:02.6957708Z Level 0 
2020-08-28T11:51:02.7227683Z Converting file start
2020-08-28T11:51:11.7408315Z Converting done 0

2020-08-28T11:51:11.8418281Z Checking results
2020-08-28T11:51:11.8418281Z Test Status ZZZZZZ: Success
2020-08-28T11:51:11.8418281Z ################################################################################
2020-08-28T11:53:24.8419656Z DDDDDD
2020-08-28T11:53:39.9397372Z Execution 0
2020-08-28T11:53:39.9417366Z Creation 0
2020-08-28T11:53:41.4877509Z Build 0
2020-08-28T11:53:02.6957708Z Level 0 
2020-08-28T11:54:02.7227683Z Converting file start
2020-08-28T11:54:11.7408315Z Converting done 0
2020-08-28T11:54:11.8148285Z Checking results
2020-08-28T11:54:11.8418281Z Test Status DDDDDD: Success

Solution

When grabbing the blocks of each operation, check if the first three timestamps are equal. If so, shift the line list before processing.

Only one change is needed.

Try this code:

for i in range(len(timestamps)-1):
    text.append(data[data.index(timestamps[i]):data.index(timestamps[i+1])])
    time_diff = parser.isoparse(timestamps[i+1]) - parser.isoparse(timestamps[i])
    print(text[-1])
    lines = text[-1].split('\n')
    if lines[0].split(' ')[0] == lines[2].split(' ')[0]: lines = lines[2:]  # add this line
    dict = {}

Note that your input CSV has a bad timestamp in the last block. The build ends before it starts.