Parsing Mix data log file

I am trying to read a big log file and parse. The log file contains mixed datatypes (example file. log.txt) and extract the min and max value on each category.

log.txt

header: 
seq: 21925
secs: 1603441909
nsecs: 503731023
data_max: 20.0
data_a: [inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, 5.611999988555908, 4.644999980926514, 4.689000129699707, 4.7179999351501465, 4.765999794006348, 4.789999961853027, 0.003000000026077032, 0.001000000026077032, 0.003000000026077032]
data_b: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, inf, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 387.0, 341.0, 0.0, 0.0, 0.0, 0.0, 441.0, 300.0, 302.0, 911.0, 320.0, 334.0, 346.0, 354.0, 359.0, 360.0, 397.0, 418.0, 348.0, 344.0, 342.0, 340.0, 334.0, 333.0, 326.0, 323.0, 322.0, 314.0, 305.0, 305.0, 296.0, 290.0, 283.0, 309.0, 284.0, 272.0, 265.0, 0.0, 0.0, 0.0]
header: 
seq: 21926
secs: 1603412219
nsecs: 523715525
data_max: 20.0
data_a: [inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, 12.448999881744385, 4.4770002365112305, 4.513000011444092, 4.546999931335449, 4.571000099182129, 4.61299991607666, 4.64900016784668, 4.690000057220459, 4.711999893188477, 4.763999938964844, 0.003000000026077032, 0.001000000026077032, 0.003000000026077032, 0.003000000026077032]
data_b: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 325.0, 321.0, 475.0, 640.0, 375.0, 339.0, 322.0, 309.0, 304.0, 304.0, 382.0, 336.0, 0.0, 0.0, 0.0, 307.0, 292.0, 0.0, 321.0, 388.0, 341.0, 0.0, 0.0, 0.0, 0.0, 436.0, 302.0, 303.0, 309.0, 320.0, 338.0, 345.0, 354.0, 361.0, 362.0, 397.0, 415.0, 348.0, 343.0, 340.0, 337.0, 335.0, 333.0, 325.0, 318.0, 317.0, 311.0, 310.0, 985.0, 296.0, 289.0, 281.0, 309.0, 985.0, 268.0, 0.0, 0.0, 0.0, 0.0]

Order: seq , secc, nsecs, min value data-a, max value data-a, min value data-b, max value data-b

output.txt

21925, 1603441909, 503731023,  0.001000000026077032, 5.611999988555908, 0.0, 911.0
21926, 1603412219, 523715525, 0.001000000026077032,  12.448999881744385, 0.0, 985

def parrse_file():
    with open('log.txt', 'r') as infile: 
            for line in infile:
                chunks = line.split('header:\n')
                for chunk in chunks[1:]:           
                    lines = chunk.strip().splitlines()
                    print lines

The problem is I got empty list. What is the root cause? How to parse the lox file and fetch the information exactly like out.txt file?

Solution

You are mixing a couple of Python concept. When dealing with a file, looping on a file object is the same as looping on each line. The below code are equivalent:

with open('log.txt', 'r') as infile:
    for line in infile:
        print(line)
    lines = infile.readlines()
    for line in lines:
        print(line)

That means, that your line variable will hold, in turn, each line of your file. So when you split on header you will never get the expected result.

Let's have a look line by line at your code to understand what's happening:

```
with open('log.txt', 'r') as infile: 
```
You create a context into which infile is a file object of your file log.txt
```
    for line in infile: 
```
You loop into the file object, this will in fact loop into each line of your file, the variable line, will take, in turn, the following values:
- header:\n
- seq: 21925\n
- secs: 1603441909\n
- nsecs: 503731023\n
- data_max: 20.0\n
- ...
```
   chunks = line.split('header:\n')
```
By splitting the line with the string header:\n you are constructing a list, based on the value of the variable line, chunks will look like this:
- ["header \n"]
- ["seq: 21926\n"]
- ...
```
for chunk in chunks[1:]:
```
You are here looping in the chunks list starting from the second element ([1:]), as chunks will always be a list with 1 element, chunks[1:] will always be an empty list, thus the code inside the loop is never called.

A possible (and not optimised) implementation of what you want could be:


def parse_file():
    # store each values
    out = []
    with open('log.txt', 'r') as infile:
        # current_section
        current = []
        # loop through each line of the document
        for raw_line in infile.readlines():
            # remove end line
            line = raw_line.strip()
            if line == "header:":
                # if line is header and there is something in current, add to the output
                if len(current) > 0:
                    out.append(" ".join(current))
                # reset current
                current = []
            elif line:
                # get key and val
                line_splitted = line.split(": ")
                key = line_splitted[0]
                val = line_splitted[1]
                # Add to current
                if key in ["seq", "seqs", "nsecs"]:
                    current.append(val)
                elif key in ["data_a", "data_b"]:
                    # Parse list by removing [] and splitting on `, `
                    raw_values = val[1:-1].split(", ")
                    values = []
                    # convert value to float
                    for value in raw_values:
                        if "inf" in value:
                            # skip inf
                            continue
                        values.append(float(value))
                    # Add min max by converting to str
                    current.append(str(min(values)))
                    current.append(str(max(values)))
        # Add last value of current to out
        out.append(" ".join(current))
    return "\n".join(out)