Search code examples
pythonpandasdatetimestring-conversion

How to convert a datetime-String with a comma (,) in the milliseconds part in python/pandas?


I got the following problem:

The date-time columns from my data got the following time-format (the columns are "Date" dd.mm.yyyy and "Time" hh:mm:ss.fff,f):

01.03.2019  12:29:15.732,7

I looked around but I couldn't find a formatting-option which deals with the part behind the comma (after the milliseconds). A source which didn't help me: https://docs.python.org/2/library/datetime.html

I am reading the csv-file in with python3 and pd.read_csv().

I got the following work-around which truncates the comma and the cipher behind it.

It is terribly slow because of the truncation of over 50000 strings in my dataset:

data = pd.read_csv('xyz.csv', sep=';', low_memory = False, parse_dates = [['Date', 'Time']], 
                   date_parser = lambda x, y : pd.to_datetime((x + ' ' + y)[:23], format='%d.%m.%Y %H:%M:%S.%f'))

What I want is to use a string-formatting which deals with the comma, either by discarding the whole milliseconds part or by converting it correctly to microseconds.

Sidenote: With R I simply used "%d.%m.%Y %H:%M:%S" which discarded the milliseconds without throwing an error.


Solution

  • ResidentSleeper is correct you can use pd.to_datetime() and drop the comma.

    import pandas as pd
    
    data1 = {'Date': ['01.03.2019  12:29:15.732,7',
                      '01.03.2019  12:29:15.732,7',
                      '01.03.2019  12:29:15.732,7',
                      '01.03.2019  12:29:15.732,7'], 
            'Value': [1, 2, 3, 4]}
    
    df1 = pd.DataFrame(data1)
    
    df1['Date'] = pd.to_datetime(df1['Date'].str.replace(',', ''))
    
    print(df1)
    
                            Date  Value
    0 2019-01-03 12:29:15.732700      1
    1 2019-01-03 12:29:15.732700      2
    2 2019-01-03 12:29:15.732700      3
    3 2019-01-03 12:29:15.732700      4