Search code examples
pythonpandasdatetimetype-conversiontimestamp

pandas datetime to unix timestamp seconds


From the official documentation of pandas.to_datetime we can say,

unit : string, default ‘ns’

unit of the arg (D,s,ms,us,ns) denote the unit, which is an integer or float number. This will be based off the origin. Example, with unit=’ms’ and origin=’unix’ (the default), this would calculate the number of milliseconds to the unix epoch start.

So when I try like this way,

import pandas as pd
df = pd.DataFrame({'time': [pd.to_datetime('2019-01-15 13:25:43')]})
df_unix_sec = pd.to_datetime(df['time'], unit='ms', origin='unix')
print(df)
print(df_unix_sec)

                 time
0   2019-01-15 13:25:43
0   2019-01-15 13:25:43
Name: time, dtype: datetime64[ns]

Output is not changing for the latter one. Every time it is showing the datetime value not number of milliseconds to the unix epoch start for the 2nd one. Why is that? Am I missing something?


Solution

  • I think you misunderstood what the argument is for. The purpose of origin='unix' is to convert an integer timestamp to datetime, not the other way.

    pd.to_datetime(1.547559e+09, unit='s', origin='unix') 
    # Timestamp('2019-01-15 13:30:00')
    

    Here are some options:

    Option 1: integer division

    Conversely, you can get the timestamp by converting to integer (to get nanoseconds) and divide by 109.

    pd.to_datetime(['2019-01-15 13:30:00']).astype(int) / 10**9
    # Float64Index([1547559000.0], dtype='float64')
    

    Pros:

    • super fast

    Cons:

    • makes assumptions about how pandas internally stores dates

    Option 2: recommended by pandas

    Pandas docs recommend using the following method:

    # create test data
    dates = pd.to_datetime(['2019-01-15 13:30:00'])
    
    # calculate unix datetime
    (dates - pd.Timestamp("1970-01-01")) // pd.Timedelta('1s')
    
    [out]:
    Int64Index([1547559000], dtype='int64')
    

    Pros:

    • "idiomatic", recommended by the library

    Cons:

    • unweildy
    • not as performant as integer division

    Option 3: pd.Timestamp

    If you have a single date string, you can use pd.Timestamp as shown in the other answer:

    pd.Timestamp('2019-01-15 13:30:00').timestamp()
    # 1547559000.0
    

    If you have to cooerce multiple datetimes (where pd.to_datetime is your only option), you can initialize and map:

    pd.to_datetime(['2019-01-15 13:30:00']).map(pd.Timestamp.timestamp)
    # Float64Index([1547559000.0], dtype='float64')
    

    Pros:

    • best method for a single datetime string
    • easy to remember

    Cons:

    • not as performant as integer division