I got the following problem:
The date-time columns from my data got the following time-format (the columns are "Date" dd.mm.yyyy
and "Time" hh:mm:ss.fff,f
):
01.03.2019 12:29:15.732,7
I looked around but I couldn't find a formatting-option which deals with the part behind the comma (after the milliseconds). A source which didn't help me: https://docs.python.org/2/library/datetime.html
I am reading the csv-file in with python3 and pd.read_csv()
.
I got the following work-around which truncates the comma and the cipher behind it.
It is terribly slow because of the truncation of over 50000 strings in my dataset:
data = pd.read_csv('xyz.csv', sep=';', low_memory = False, parse_dates = [['Date', 'Time']],
date_parser = lambda x, y : pd.to_datetime((x + ' ' + y)[:23], format='%d.%m.%Y %H:%M:%S.%f'))
What I want is to use a string-formatting which deals with the comma, either by discarding the whole milliseconds part or by converting it correctly to microseconds.
Sidenote: With R I simply used "%d.%m.%Y %H:%M:%S"
which discarded the milliseconds without throwing an error.
ResidentSleeper is correct you can use pd.to_datetime()
and drop the comma.
import pandas as pd
data1 = {'Date': ['01.03.2019 12:29:15.732,7',
'01.03.2019 12:29:15.732,7',
'01.03.2019 12:29:15.732,7',
'01.03.2019 12:29:15.732,7'],
'Value': [1, 2, 3, 4]}
df1 = pd.DataFrame(data1)
df1['Date'] = pd.to_datetime(df1['Date'].str.replace(',', ''))
print(df1)
Date Value
0 2019-01-03 12:29:15.732700 1
1 2019-01-03 12:29:15.732700 2
2 2019-01-03 12:29:15.732700 3
3 2019-01-03 12:29:15.732700 4