Search code examples
pythonpandasdatetime-formatunix-timestamp

How to safely convert columns with dates represented as strings to unix timestamps using pandas?


I have following data frame:

       dteday

0      2011-01-01
1      2011-01-02
2      2011-01-03
3      2011-01-04
4      2011-01-05
5      2011-01-06
6      2011-01-07
7      2011-01-08
8      2011-01-09
9      2011-01-10
10     2011-01-11
11     2011-01-12
12     2011-01-13
13     2011-01-14
14     2011-01-15
15     2011-01-16
16     2011-01-17

And want to transform this column to column of Unix timestamps of this date.

I tried this, but ran into next error:

df['tmstamp'] = df.dteday.astype(np.int64)

Error:ValueError: invalid literal for int() with base 10: '2011-01-01'

I can't find the same questions anywhere. What's the problem? Thanks.


Solution

  • Looks like your current code is trying to directly convert the string 2011-01-01 to an integer i.e. np.int64. The parsing/conversion fails thus you're seeing an error.

    You can use pd.to_datetime() method to convert the string values in the column to datetime object first. (Docs). Then you can convert the type to np.int64.

    Given the following dataframe:

            dates
    0  2011-01-01
    1  2011-01-02
    2  2011-01-03
    3  2011-01-04
    4  2011-01-05
    

    Try this:

    df['timestamp'] = pd.to_datetime(df['dates']).astype(np.int64)
    

    Outputs:

            dates            timestamp
    0  2011-01-01  1293840000000000000
    1  2011-01-02  1293926400000000000
    2  2011-01-03  1294012800000000000
    3  2011-01-04  1294099200000000000
    4  2011-01-05  1294185600000000000