Search code examples
apache-sparkcastingdatabricksdate-format

How can I convert a specific string date to date or datetime in Spark?


I have this string pattern in my Spark dataframe: 'Sep 14, 2014, 1:34:36 PM'.

I want to convert this to date or datetime format, using Databricks and Spark.

I've already tried the cast and to_date functions, but nothing works and I got null return everytime.

How can I do that?

Thanks in advance!


Solution

  • If we have a created table like this:

    var ds = spark.sparkContext.parallelize(Seq(
      "Sep 14, 2014, 01:34:36 PM"
    )).toDF("date")
    

    Through the following statement:

    ds = ds.withColumn("casted", to_timestamp(col("date"), "MMM dd, yyyy, hh:mm:ss aa"))
    

    You get this result:

    +-------------------------+-------------------+
    |date                     |casted             |
    +-------------------------+-------------------+
    |Sep 14, 2014, 01:34:36 PM|2014-09-14 13:34:36|
    +-------------------------+-------------------+
    

    which should be useful to you. You can use to_date or other APIs that require a datetime format, good luck!