Search code examples
python-2.7strptime

Python, calculating time difference


I'm parsing logs generated from multiple sources and joined together to form a huge log file in the following format;

My_testNumber: 14, JobType = testx.

ABC 2234

**SR 111**
1483529571  1   1   Wed Jan  4 11:32:51 2017    0   4
    datatype someRandomValue
SourceCode.Cpp 588

DBConnection failed

TB 132


**SR 284**
1483529572  0   1   Wed Jan  4 11:32:52 2017    5010400     4
    datatype someRandomXX
SourceCode2.cpp 455

DBConnection Success

TB 102

**SR 299**

1483529572  0   1   **Wed Jan  4 11:32:54 2017**    5010400     4
    datatype someRandomXX
SourceCode3.cpp 455

ConnectionManager Success

.... (there are dozens of SR Numbers here)

Now i'm looking a smart way to parse logs so that it calculates time differences in seconds for each testNumber and SR number like My_testNumber:14 it subtracts SR 284 and SR 111 time (difference would be 1 second here), for SR 284 and 299 it is 2 seconds and so on.


Solution

  • You can parse your posted log file and save the corresponding data accordingly. Then, you can work with the data to get the time differences. The following should be a decent start:

    from itertools import combinations
    from itertools import permutations # if order matters
    from collections import OrderedDict
    from datetime import datetime
    import re
    
    
    sr_numbers = []
    dates = []
    
    # Loop through the file and get the test number and times
    # Save the data in a list
    
    pattern = re.compile(r"(.*)\*{2}(.*)\*{2}(.*)")
    for line in open('/Path/to/log/file'):
        if '**' in line:
            # Get the data between the asterisks
            if 'SR' in line:
                sr_numbers.append(re.sub(pattern,"\\2", line.strip()))
            else:
                dates.append(datetime.strptime(re.sub(pattern,"\\2", line.strip()), '%a %b  %d %H:%M:%S %Y'))
        else:
            continue
    
    # Use hashmap container (ordered dictionary) to make it easy to get the time differences
    # Using OrderedDict here to maintain the order of the order of the test number along the file
    log_dict = OrderedDict((k,v) for k,v in zip(sr_numbers, dates))
    
    # Use combinations to get the possible combinations (or permutations if order matters) of time differences
    time_differences = {"{} - {}".format(*x):(log_dict[x[1]] - log_dict[x[0]]).seconds for x in combinations(log_dict, 2)}
    
    print(time_differences)
    
    # {'SR 284 - SR 299': 2, 'SR 111 - SR 284': 1, 'SR 111 - SR 299': 3}
    

    Edit:

    Parsing the file without relying on the asterisks around the dates:

    from itertools import combinations
    from itertools import permutations # if order matters
    from collections import OrderedDict
    from datetime import datetime
    import re
    
    
    sr_numbers = []
    dates = []
    
    # Loop through the file and get the test number and times
    # Save the data in a list
    
    pattern = re.compile(r"(.*)\*{2}(.*)\*{2}(.*)")
    for line in open('/Path/to/log/file'):
        if 'SR' in line:
            current_sr_number = re.sub(pattern,"\\2", line.strip())
            sr_numbers.append(current_sr_number)
        elif line.strip().count(":") > 1:
            try:
                dates.append(datetime.strptime(re.split("\s{3,}",line)[2].strip("*"), '%a %b  %d %H:%M:%S %Y'))
            except IndexError:
                #print(re.split("\s{3,}",line))
                dates.append(datetime.strptime(re.split("\t+",line)[2].strip("*"), '%a %b  %d %H:%M:%S %Y'))
        else:
            continue
    
    
    # Use hashmap container (ordered dictionary) to make it easy to get the time differences
    # Using OrderedDict here to maintain the order of the order of the test number along the file
    log_dict = OrderedDict((k,v) for k,v in zip(sr_numbers, dates))
    
    # Use combinations to get the possible combinations (or permutations if order matters) of time differences
    time_differences = {"{} - {}".format(*x):(log_dict[x[1]] - log_dict[x[0]]).seconds for x in combinations(log_dict, 2)}
    
    print(time_differences)
    
    # {'SR 284 - SR 299': 2, 'SR 111 - SR 284': 1, 'SR 111 - SR 299': 3}
    

    I hope this proves useful.