Search code examples
apache-sparkcastingdatabricksstring-to-datetime

How can I convert a specific string format to date or datetime in Spark?


I have this string pattern in my Spark dataframe: '8 de jan de 2014 08:57:15'. It's a PT-BR pattern.

I want to convert this to datetime format, using Databricks and Spark.

I've already tried this: df.select(f.to_timestamp(f.col('date_column'), 'd MMM yyyy hh:mm:ss').alias('new_date_column')), but I got NaT values and it didn't work.

How can I do that?

Thanks in advance!


Solution

  • You are passing a wrong date format. The correct date format is : "d 'de' MMM 'de' yyyy HH:mm:ss"

    >>> spark.sql("""select to_timestamp('8 de jan de 2014 08:57:15', "d 'de' MMM 'de' yyyy HH:mm:ss")""").show()
    +----------------------------------------------------------------------+        
    |to_timestamp(8 de jan de 2014 08:57:15, d 'de' MMM 'de' yyyy HH:mm:ss)|
    +----------------------------------------------------------------------+
    |                                                   2014-01-08 08:57:15|
    +----------------------------------------------------------------------+