Search code examples
pythonpython-3.xpowershellcsvpowershell-3.0

Merge CSV columns with irregular timestamps and different header names per file


I have long CSV files with different headers in every file. The first column is always a timestamp which is irregular with its timings, so it rarely matches.

file1.csv

time,L_pitch,L_roll,L_yaw
2020-08-21T09:58:07.570,-0.0,-6.1,0.0
2020-08-21T09:58:07.581,-0.0,-6.1,0.0
2020-08-21T09:58:07.591,-0.0,-6.1,0.0
....

file2.csv

time,R_pitch,R_roll,R_yaw
2020-08-21T09:58:07.591,1.3,-5.7,360.0
2020-08-21T09:58:07.607,1.3,-5.7,360.0
2020-08-21T09:58:07.617,1.3,-5.7,360.0
....

file3.csv

time,L_accel_lat,L_accel_long,L_accel_vert
2020-08-21T09:58:07.420,-0.00,-0.00,0.03
2020-08-21T09:58:07.430,-0.00,0.00,0.03
2020-08-21T09:58:07.440,-0.00,0.00,0.03
....

At the moment there can be up to 6 CSV files in that format in a folder.

I would like to merge these CSV into one file where all columns are recognized and sorted according to the timestamps. When timestamps are matching, data gets merged into its corresponding line. If time is not matched, it gets a separate line with empty fields.

The result should look like this.

time,L_pitch,L_roll,L_yaw,R_pitch,R_roll,R_yaw,L_accel_lat,L_accel_long,L_accel_vert
2020-08-21T09:58:07.420,,,,,,,-0.00,-0.00,0.03
2020-08-21T09:58:07.430,,,,,,,-0.00,0.00,0.03
2020-08-21T09:58:07.440,,,,,,,-0.00,0.00,0.03
....
2020-08-21T09:58:07.581,-0.0,-6.1,0.0,,,,,,
2020-08-21T09:58:07.591,-0.0,-6.1,0.0,1.3,-5.7,360.0,,,

Last line would be an example of a matching timecode and with this also datamerging into one line

So far I tried this Github Link, but this merges with filenames into the CSV and no sorting. Panda in Python seems to be up to the task, but my skills are not. I also tried some python files from GitHub...

This one seemed the most promising with changing the user, but it runs with no end (files to big?).

Is this possible to do this in a PowerShell ps1 or a somewhat (for me) "easy" python script? I would build this into a batch file to work in several folders.

Thanks in advance

goam


Solution

  • As you mentioned, you can solve your problems rather conveniently using pandas.

    import pandas as pd
    import glob
    
    tmp=[]
    for f in glob.glob("file*"):
        print(f)
        tmp.append(pd.read_csv(f, index_col=0, parse_dates=True))
    
    pd.concat(tmp,axis=1,sort=True).to_csv('merged')
    

    Some explanation:

    Here, we use glob to get the list of files using the wildcard pattern file*. We loop over this list and read each file using pandas read_csv. Note, we parse the dates of the file (converts to dtype datetime64[ns]) and use the date column as an index of the dataframe. We store the dataframes in a list called tmp. Finally we concatinate the individual dataframes (of the individual files) in tmp using concat and immediately write it to a file called merged.csv using pandas to_csv.