Search code examples
pythonpysparktimestamp

Convert from string to datetime in spark?


i've tried following some of the answers on here but i think i'm doing something wrong, ideally i would like to convert my string represenation to a timestamp.

Although initially the conversation caused miliseconds to be added to the end of it. Which i didn't want so i choose this approach

data: for example is 2022-03-15T01:46:32 which i would like this representation of yyyy-MM-ddTHH:mm:ss to be converted to yyyy-MM-dd HH:mm:ss

I attempted this way

new_df = df_geo.withColumn("timestamp", to_timestamp("timestamp", format= "yyyy-MM-dd HH:mm:ss"))

Which the outcome was a bunch of null values in the timestamp column, update the reason it was coming out as nulls is due to not including 'T' in the space between day and hour.

df_geo.withColumn("timestamp", to_timestamp("timestamp", format= "yyyy-MM-dd'T'HH:mm:ss"))

This still does cause milliseconds to be added to the end of the representation, I can try data format to change the representation but then with that it will change the data type to a string whereas i want to keep it as a timestamp


Solution

  • format the date with the desired format and then cast to timestamp

    input
       .withColumn("date_time_another_format",
         f.date_format('date_time_field,"yyyy-MM-dd HH:mm:ss")
           .cast("timestamp")
       )
    
      /*
    
      INPUT
      +---+-------------------+
      |id |date_time_field    |
      +---+-------------------+
      |1  |2024-02-20T01:30:00|
      |2  |2024-02-10T05:30:45|
      +---+-------------------+
    
      //OUTPUT SCHEMA
      root
       |-- id: integer (nullable = false)
       |-- date_time_field: string (nullable = true)
       |-- date_time_another_format: timestamp (nullable = true)
    
      //OUTPUT
      +---+-------------------+------------------------+
      | id|    date_time_field|date_time_another_format|
      +---+-------------------+------------------------+
      |  1|2024-02-20T01:30:00|     2024-02-20 01:30:00|
      |  2|2024-02-10T05:30:45|     2024-02-10 05:30:45|
      +---+-------------------+------------------------+
       */