I use the following code to cast a string type time timstm_hm
to timestamp time timstm_hm_timestamp
. Here is the code.
from pyspark.sql.functions import col, unix_timestamp
df = df.withColumn('timstm_hm_timestamp', unix_timestamp(col('timstm_hm'), "yyyy-mm-dd HH:mm").cast("timestamp"))
Here are the results.
-------------------------------------------------
| timstm_hm | timstm_hm_timestamp |
-------------------------------------------------
|2018-02-08 11:04 | 2018-01-08 11:04:00 |
-------------------------------------------------
|2018-02-27 20:34 | 2018-01-27 20:34:00 |
-------------------------------------------------
|2018-02-23 19:47 | 2018-01-23 19:47:00 |
-------------------------------------------------
Why there is one month difference between the conversion? It is very weird as it works for month January but not since February.
You just need to replace mm
with capitals MM
.
Refer to java date format for more info: Javasimpledate
from pyspark.sql.functions import col, unix_timestamp
df.withColumn('timstm_hm_timestamp', unix_timestamp(col('timstm_hm'), "yyyy-MM-dd HH:mm").cast("timestamp")).show()
+----------------+-------------------+
| timstm_hm|timstm_hm_timestamp|
+----------------+-------------------+
|2018-02-08 11:04|2018-02-08 11:04:00|
+----------------+-------------------+
Also, you can achieve same output by using just to_timestamp
with capital MM
.
from pyspark.sql.functions import to_timestamp
df.withColumn("timestm_hm_timestamp", to_timestamp("timstm_hm","yyyy-MM-dd HH:mm" )).show()
+----------------+--------------------+
| timstm_hm|timestm_hm_timestamp|
+----------------+--------------------+
|2018-02-08 11:04| 2018-02-08 11:04:00|
+----------------+--------------------+