Search code examples
pythonpandasdataframeexport-to-csv

Transform a log file to csv using pandas


I am trying to transform a log file that looks like this

      Name: AGV
   Version: 1.0.00
  Revision: 0000000000
Build date: 2000-00-00 00:00:00

Continuation of previous file

[1639992888.497] [B62FF420] [DEBUG   Wings.cpp:222] Current sidewing pressure: 3410
[1639992888.497] [B62FF420] [DEBUG   Wings.cpp:222] Current sidewing pressure: 4206
[1639992888.517] [B62FF420] [DEBUG   Wings.cpp:222] Current sidewing pressure: 3433
[1639992888.517] [B62FF420] [DEBUG   Wings.cpp:222] Current sidewing pressure: 4229
[1639992888.527] [B62FF420] [INFO    Position.cpp:438] <AGVPOS> 602, 7787.496, 

To a csv file.

I have tried to remove the first few lines which I don't need and added name for columns manually, then did this, this

df = pd.read_fwf('data.log')
df.to_csv('data.csv', index=None)

This has worked for the first log file, but not for the other files as I get some additional columns for each one of them.

The output I want to get is something Like this

Timestamp.       Code.      Message  
[1639992888.497] [B62FF420] [DEBUG   Wings.cpp:222] Current sidewing pressure: 3410
[1639992888.497] [B62FF420] [DEBUG   Wings.cpp:222] Current sidewing pressure: 4206
[1639992888.517] [B62FF420] [DEBUG   Wings.cpp:222] Current sidewing pressure: 3433
[1639992888.517] [B62FF420] [DEBUG   Wings.cpp:222] Current sidewing pressure: 4229
[1639992888.527] [B62FF420] [INFO    Position.cpp:438] <AGVPOS> 602, 7787.496, 

My method is definitely not the most efficient, is there some other way I can do this?

Thank you.


Solution

  • According to your comment this is the best approach (you will have to do cleaning of the data afterwards but it would work)

    import pandas as pd
    
    df = pd.read_csv('test_fwf.log', skiprows=7, sep='(?:\]\s+\[)', engine = 'python', names=['timestamp', 'code', 'message'])
    

    Explanation

    read_csv can recieve a .log file because it is still a plain text file, so the parameter delimiter can recieve a regular expression the pattern I selected to separe the files is the '] [' characters you have in each line so the result should always have 3 columns, and the parameter names is the names of the columns you'd like to obtain.

    the skiprows parameter allows you to skip n rows of your input file.

    Notice this regex should work with files with multiple spaces between the sep if you are certain that is a tab character you must update the regex accordingly