I am trying to transform a log file that looks like this
Name: AGV
Version: 1.0.00
Revision: 0000000000
Build date: 2000-00-00 00:00:00
Continuation of previous file
[1639992888.497] [B62FF420] [DEBUG Wings.cpp:222] Current sidewing pressure: 3410
[1639992888.497] [B62FF420] [DEBUG Wings.cpp:222] Current sidewing pressure: 4206
[1639992888.517] [B62FF420] [DEBUG Wings.cpp:222] Current sidewing pressure: 3433
[1639992888.517] [B62FF420] [DEBUG Wings.cpp:222] Current sidewing pressure: 4229
[1639992888.527] [B62FF420] [INFO Position.cpp:438] <AGVPOS> 602, 7787.496,
To a csv file.
I have tried to remove the first few lines which I don't need and added name for columns manually, then did this, this
df = pd.read_fwf('data.log')
df.to_csv('data.csv', index=None)
This has worked for the first log file, but not for the other files as I get some additional columns for each one of them.
The output I want to get is something Like this
Timestamp. Code. Message
[1639992888.497] [B62FF420] [DEBUG Wings.cpp:222] Current sidewing pressure: 3410
[1639992888.497] [B62FF420] [DEBUG Wings.cpp:222] Current sidewing pressure: 4206
[1639992888.517] [B62FF420] [DEBUG Wings.cpp:222] Current sidewing pressure: 3433
[1639992888.517] [B62FF420] [DEBUG Wings.cpp:222] Current sidewing pressure: 4229
[1639992888.527] [B62FF420] [INFO Position.cpp:438] <AGVPOS> 602, 7787.496,
My method is definitely not the most efficient, is there some other way I can do this?
Thank you.
According to your comment this is the best approach (you will have to do cleaning of the data afterwards but it would work)
import pandas as pd
df = pd.read_csv('test_fwf.log', skiprows=7, sep='(?:\]\s+\[)', engine = 'python', names=['timestamp', 'code', 'message'])
read_csv can recieve a .log file because it is still a plain text file, so the parameter delimiter can recieve a regular expression the pattern I selected to separe the files is the '] [' characters you have in each line so the result should always have 3 columns, and the parameter names
is the names of the columns you'd like to obtain.
the skiprows
parameter allows you to skip n rows of your input file.
Notice this regex should work with files with multiple spaces between the sep if you are certain that is a tab character you must update the regex accordingly