Search code examples
pythondata-processing

Reading in data from a log file in python


I trying to parse in log entries that have a recurring pattern and write each entry into its own file using python. All log entries have the general format:

ProcessID= abc
.
.
.
.
.
Size=76 bytes
EOE
------------------------------------------------------------------------

StartTime=abc
.
.
.
.
.
Size=76 bytes
EOE
------------------------------------------------------------------------

DifferentParameter=abc
.
.
.
.
.
Size=76 bytes
EOE
------------------------------------------------------------------------

Each entry has a different number of parameters. Essentially what I need to do is to parse in only 2 of the parameters and map them together, but not every entry has both parameters so my first goal is to split the entries into separate files (or if someone knows of a better way to split entries) and then I will further process each entry using regex or something similar.

So far I've got the following bit of code to try and parse 10 log entries but I'm not entirely sure how to handle the case of it finding the EOE entry and then moving to the next line.

rf = open('data.txt', 'r')
lines = rf.readLine()
rf.close()

i = 0

while i != 10:

for line in lines:
    while(line.find('EOE') == -1):
        with open('data'+(i)+'.txt', 'w') as wf:
            wf.write(line)

    file.seek(1,1)    
    i+=1

rf.close()  


Solution

  • in my opinion, there are some problems even with the log file. You try to split by EOE, but by doing that you will obtain files that have the "-----" line at the beginning and others that don't (in particular the "processID" section will not have the "----" at the beginning). Therefore, why not to split by the "----"? Second problem, empty lines. Also, in this case, you will have some files that start with empty lines and others that don't. One needs to take this into consideration.

    I tried to solve all these problems and obtain files that are in the same format, no empty lines, and that start with the line that contains the "=".

    I called the input file "log_stack.txt".

    Here my humble solution:

    with open("log_stack.txt") as f:
        read_data = f.readlines()
    
    s=""
    counter=1
    
    f=open("file_{}.txt".format(counter), "w")
    for i in read_data:
    
      if(i.find("---") == -1):
        if(i!="\n"):
          s+=i
      else:
        f.write(s)
        s=""
        f.close()
        counter+=1
        f=open("file_{}.txt".format(counter), "w")
    f.close()