Search code examples
python-3.xregexpandaspython-datetimelogfile

.logs file to .csv file wih pandas


    33 HEIGHT  5 5 INFO ABSENT FROM SCHOOL 123 

    He has been absent for 48 hours

                                                                     

     NOLAN Time:2020-07-14 12:34:08 PAUL

So I have a log file that looks like this above. This is a sample of the logs. And they all follow this format. I was able to break this down into my different fields. I was thinking if anyone could help with this in pandas.

these are the various columns and rows:

COUNT|HEIGHT|MESSAGE_TITLE |BODY |SCHHO|DATETIME |NAME

33 | 5.5 |INFO ABSENT FROM SCHOOL 123|He was...|NOLAN|Time:2020-07-14 12:34:08|PAUL

I have started by haven't made any progress..

import pandas as PD import NumPy as np import glob import os import JSON import wx from DateTime import DateTime import DateTime as dt import time as t

import matplotlib.pyplot as plt

import xlsxwriter

with open('STUDENT_PROFILE.log') as f:

log = f.readlines()
print(log)

len(log)

n = len(log)

for i in range(n):

log[i] = log[i].strip()

log[6].split(']')

log[6].split(']')[7]

log[6].split(']')[7].upper()

s = log[0].split(']')[0].strip('[')

dtfmt ='%Y-%m-%d %I:%M:%S' # %H -> 24 hours, %I-> 12 hours

dt2 = datetime.strptime(s,dtfmt)

for line in log:

s1=line.split(']')[0].strip('[')

dt2 = dt.strptime(s1, dtfmt)

col1.append(dt2)

s= line.split(']')[1].strip().split(':')

col2.append(s[0])

if len(s) == 7:

    col3.append(s[1])
    col4.append(s[2])
    col5.append(s[3])
    col6.append(s[4])
    col7.append(s[5])
    col8.append(s[6])

else:

    col8.append(np.nan)

df = pd.DataFrame([col1,col2,col3,col4,col5,col6,col7,col8])

df = df.T

df.columns=['COUNT','HEIGHT','MESSAGE_TITLE','BODY','SCHHOL','DATETIME','NAME']

df['delta_t'] =df.datetime - df.datetime[0]

df['delta_t_seconds'] = 0

for i in range(df.shape[0]):

df.ix[i,'delta_t_seconds'] = df.delta_t.iloc[i].seconds

df.to_csv('test_log.csv', index=False)

Any ideas...


Solution

  • Provided no constraints, this should work. Note the part where there is 5 5, you'll need to handle adding the period. Regex simply matches a set of patterns; as such its job isn't to add or remove characters.

    Also, I noticed that you're reading in lines and iterating over them. For this pattern to work, it assumes one single multi-line string, so you'll need to join the array with newline characters.

    This is by no means an elegant solution and by no means a bulletproof one with limited example input or constraints.

    Pattern

    \s+(\d+)\s+HEIGHT\s+(\d\s\d{1,2})\s+([^\n\r]+)\W+([^\n]+)\W+([A-Z]+)\sTime:(\S+\s+\S+)\s+([A-Z]+)
    

    See the demo here.