Search code examples
pandasnumpymachine-learningepoch

Convert several columns to epoch pandas


I am trying to convert the following columns to epoch to prepare for machine learning the rest of my csv contains strings so I am assuming this is the best way to go, i tried to create a numpy array and transform it using datetime , etc but this did not work i have 4 columns i am trying to transform from dd/mm/yyyy to epoch ? I have tried this method

epoch_time = (1/2/2017 - datetime(1/1/1970)).total_seconds()

but i have 4 columns and i want to convert all of them thanks so much

state                       object
car_model                   object
car_make                    object
car_year                   float64
drive_date                  object
id                           int64
register_date       datetime64[ns]
profile_date                object
add_profile_date            object
dtype: object

id: 23432   state:ohio  car_model:ford car_make:fusion car_year:2016 drive_date:1/1/2017 register_date:12/25/2016 profile_date:12/25/2016 add_profile_date: 12/25/2016

Solution

  • Try this:

    Source DF:

    In [173]: df
    Out[173]:
          id state car_model car_make  car_year drive_date register_date profile_date add_profile_date
    0  23432  ohio      ford   fusion      2016   1/1/2017    2016-12-25   12/25/2016       12/25/2016
    
    In [174]: df.dtypes
    Out[174]:
    id                           int64
    state                       object
    car_model                   object
    car_make                    object
    car_year                     int64
    drive_date                  object
    register_date       datetime64[ns]
    profile_date                object
    add_profile_date            object
    dtype: object
    

    let's select date columns:

    In [175]: date_cols = df.columns[df.columns.str.contains('_date')]
    
    In [176]: date_cols
    Out[176]: Index(['drive_date', 'register_date', 'profile_date', 'add_profile_date'], dtype='object')
    

    first convert "string" dates to Pandas datetime, then convert it to UNIX epoch

    In [177]: for col in date_cols:
         ...:     if df.dtypes.loc[col] == 'object':
         ...:         df[col] = pd.to_datetime(df[col])
         ...:     df[col] = df[col].astype(np.int64) // 10**9
         ...:
    
    In [178]: df
    Out[178]:
          id state car_model car_make  car_year  drive_date  register_date  profile_date  add_profile_date
    0  23432  ohio      ford   fusion      2016  1483228800     1482624000    1482624000        1482624000
    
    In [179]: df.dtypes
    Out[179]:
    id                   int64
    state               object
    car_model           object
    car_make            object
    car_year             int64
    drive_date           int64
    register_date        int64
    profile_date         int64
    add_profile_date     int64
    dtype: object