i've tried following some of the answers on here but i think i'm doing something wrong, ideally i would like to convert my string represenation to a timestamp.
Although initially the conversation caused miliseconds to be added to the end of it. Which i didn't want so i choose this approach
data: for example is 2022-03-15T01:46:32 which i would like this representation of yyyy-MM-ddTHH:mm:ss
to be converted to yyyy-MM-dd HH:mm:ss
I attempted this way
new_df = df_geo.withColumn("timestamp", to_timestamp("timestamp", format= "yyyy-MM-dd HH:mm:ss"))
Which the outcome was a bunch of null values in the timestamp column, update the reason it was coming out as nulls is due to not including 'T' in the space between day and hour.
df_geo.withColumn("timestamp", to_timestamp("timestamp", format= "yyyy-MM-dd'T'HH:mm:ss"))
This still does cause milliseconds to be added to the end of the representation, I can try data format to change the representation but then with that it will change the data type to a string whereas i want to keep it as a timestamp
format the date with the desired format and then cast to timestamp
input
.withColumn("date_time_another_format",
f.date_format('date_time_field,"yyyy-MM-dd HH:mm:ss")
.cast("timestamp")
)
/*
INPUT
+---+-------------------+
|id |date_time_field |
+---+-------------------+
|1 |2024-02-20T01:30:00|
|2 |2024-02-10T05:30:45|
+---+-------------------+
//OUTPUT SCHEMA
root
|-- id: integer (nullable = false)
|-- date_time_field: string (nullable = true)
|-- date_time_another_format: timestamp (nullable = true)
//OUTPUT
+---+-------------------+------------------------+
| id| date_time_field|date_time_another_format|
+---+-------------------+------------------------+
| 1|2024-02-20T01:30:00| 2024-02-20 01:30:00|
| 2|2024-02-10T05:30:45| 2024-02-10 05:30:45|
+---+-------------------+------------------------+
*/