Searched myself silly, but couldn't find the answer.
Basically I want to import a number of GPS files with the aim to know the location of each GPS at any given time.
I wanted to use Panda's datetime index for this. What I can't seem to figure out is how to align this data.
My result is that each gps starts a new timedate index, I think I'm overwriting my timedata with every import.
I've tried creating a df outside the for loop first, but not with great results.
This is my code:
import pandas as pd
import glob
import os
from datetime import datetime
from pandas import ExcelWriter
pattern = '*.csv'
csv_files = glob.glob(pattern)
frames = []
for csv in csv_files:
with open(csv) as fp:
skip = next(filter(
lambda x: x[1].startswith('trkpt'),
enumerate(fp)
))[0] + 1
df = pd.read_csv(csv, usecols = ['lat','lon','ele','time'], parse_dates=['time'], skiprows=skip)
df['DateTime'] = pd.to_datetime(df['time'], format='%Y-%m-%d %H:%M:%S')
df = df.set_index('DateTime')
df.rename(columns={'lat':'lat' + ' ' + csv,'lon':'lon' + ' ' + csv,'ele':'ele' + ' ' + csv}, inplace=True)
df.drop(['time'], axis=1, inplace=True)
frames.append(df)
df = pd.concat(frames)
df.to_csv('GPS Export.csv', sep=',')
File example
trkpt
ID trksegID lat lon ele time
1 1 -32.46226206 116.0619373 311.6 2021-01-22T01:54:03Z
2 1 -32.46225444 116.0619245 311.6 2021-01-22T01:54:04Z
3 1 -32.46225762 116.0619227 314.97 2021-01-22T01:54:05Z
4 1 -32.46226215 116.0619119 316.41 2021-01-22T01:54:06Z
5 1 -32.46226123 116.0618896 317.85 2021-01-22T01:54:07Z
6 1 -32.46225611 116.0618791 317.85 2021-01-22T01:54:08Z
7 1 -32.46224949 116.0618693 316.41 2021-01-22T01:54:09Z
8 1 -32.46224086 116.0618602 314.97 2021-01-22T01:54:10Z
9 1 -32.46223943 116.0618525 314.49 2021-01-22T01:54:11Z
10 1 -32.46225385 116.0618722 314.49 2021-01-22T01:54:12Z
also got a small problem with the date formatting, but I can live with that
The solution would be to set the datetime index after concatenating the files. The snippet below assumes that all csv's are formatted similarly to your csv snippet and skips the first row with trkpt
. It also adds a column with the csv filename since it appears you wish to do some renaming or postprocessing with the filename.
import glob
import pandas as pd
import os
df = pd.concat([pd.read_csv(fp, skiprows=1).assign(filename=os.path.basename(fp)) for fp in glob.glob('*.csv')])
df['DateTime'] = pd.to_datetime(df['time'], format='%Y-%m-%d %H:%M:%S')
df = df.set_index('DateTime')
df.drop(['time'], axis=1, inplace=True)
Output:
| DateTime | ID | trksegID | lat | lon | ele | filename |
|:--------------------------|-----:|-----------:|---------:|--------:|-------:|:-----------|
| 2021-01-22 01:54:03+00:00 | 1 | 1 | -32.4623 | 116.062 | 311.6 | 2.csv |
| 2021-01-22 01:54:04+00:00 | 2 | 1 | -32.4623 | 116.062 | 311.6 | 2.csv |
| 2021-01-22 01:54:05+00:00 | 3 | 1 | -32.4623 | 116.062 | 314.97 | 2.csv |
| 2021-01-22 01:54:06+00:00 | 4 | 1 | -32.4623 | 116.062 | 316.41 | 2.csv |
| 2021-01-22 01:54:07+00:00 | 5 | 1 | -32.4623 | 116.062 | 317.85 | 2.csv |
| 2021-01-22 01:54:08+00:00 | 6 | 1 | -32.4623 | 116.062 | 317.85 | 2.csv |
| 2021-01-22 01:54:09+00:00 | 7 | 1 | -32.4622 | 116.062 | 316.41 | 1.csv |
| 2021-01-22 01:54:10+00:00 | 8 | 1 | -32.4622 | 116.062 | 314.97 | 1.csv |
| 2021-01-22 01:54:11+00:00 | 9 | 1 | -32.4622 | 116.062 | 314.49 | 1.csv |
| 2021-01-22 01:54:12+00:00 | 10 | 1 | -32.4623 | 116.062 | 314.49 | 1.csv |