Search code examples
pythonpandasdatetimeunix-timestamp

int - String Type error while converting datetime to Unix Time epoch


I am trying to convert Datetime to Unix time epoch but I am getting following error.

Input:

userid,datetime,latitude,longitude
156,2014-02-01 00:00:00.739166+01,41.8836718276551,12.4877775603346
187,2014-02-01 00:00:01.148457+01,41.9285433333333,12.4690366666667
297,2014-02-01 00:00:01.220066+01,41.8910686119733,12.4927045625339
89,2014-02-01 00:00:01.470854+01,41.7931766914244,12.4321219603157
79,2014-02-01 00:00:01.631136+01,41.90027472,12.46274618
191,2014-02-01 00:00:02.048546+01,41.8523047579646,12.5774065771898
343,2014-02-01 00:00:02.647839+01,41.8921718255185,12.4696996165151
341,2014-02-01 00:00:02.709888+01,41.9102125627332,12.4770004336041
260,2014-02-01 00:00:03.458195+01,41.8658208551143,12.4655221109313

Program:

import pandas as pd
import numpy as np
import io

df = pd.read_csv('input.csv', 
                 #header=None, #no header in csv
                 header=['userid','datetime','latitude','longitude'], #set custom column names
                 parse_dates=['datetime']) #parse columns d, e to datetime

df['datetime'] = df['datetime'].astype(np.int64) // 10**9
#df['e'] = df['e'].astype(np.int64) // 10**9

df.to_csv('output.csv', header=True, index=False)

The above program worked fine when in python 2.7 but not that I have upgraded to python 3.x Anaconda I am not able to get results

Error:

  File "pandas\parser.pyx", line 519, in pandas.parser.TextReader.__cinit__ (pandas\parser.c:5907)

TypeError: Can't convert 'int' object to str implicitly

Edit: input file here


Solution

  • The header argument in pd.read_csv expects an int or list of ints not a list of strings.

    from io import StringIO
    file="""
    userid,datetime,latitude,longitude
    156,2014-02-01 00:00:00.739166+01,41.8836718276551,12.4877775603346
    187,2014-02-01 00:00:01.148457+01,41.9285433333333,12.4690366666667
    297,2014-02-01 00:00:01.220066+01,41.8910686119733,12.4927045625339
    89,2014-02-01 00:00:01.470854+01,41.7931766914244,12.4321219603157
    79,2014-02-01 00:00:01.631136+01,41.90027472,12.46274618
    191,2014-02-01 00:00:02.048546+01,41.8523047579646,12.5774065771898
    343,2014-02-01 00:00:02.647839+01,41.8921718255185,12.4696996165151
    341,2014-02-01 00:00:02.709888+01,41.9102125627332,12.4770004336041
    260,2014-02-01 00:00:03.458195+01,41.8658208551143,12.4655221109313"""
    

    Let's try this read_csv statement:

    df = pd.read_csv(StringIO(file),parse_dates=['datetime'])
    df['datetime'] = df['datetime'].astype(np.int64) // 10**9
    
    print(df.head())
    

    output:

       userid    datetime   latitude  longitude
    0     156  1391209200  41.883672  12.487778
    1     187  1391209201  41.928543  12.469037
    2     297  1391209201  41.891069  12.492705
    3      89  1391209201  41.793177  12.432122
    4      79  1391209201  41.900275  12.462746