Search code examples
apache-sparkdatepysparkapache-spark-sqllocale

Return month name in specified locale


Using date_format we can extract month name from a date:

from pyspark.sql import functions as F
df = spark.createDataFrame([('2021-05-01',),('2021-06-01',)], ['c1']).select(F.col('c1').cast('date'))
df = df.withColumn('month', F.date_format('c1', 'LLLL'))
df.show()
#+----------+-----+
#|        c1|month|
#+----------+-----+
#|2021-05-01|  May|
#|2021-06-01| June|
#+----------+-----+

It's in English, but I would like to get it in French.

I have found that Spark is aware of month names in French!

spark.sql("select to_csv(named_struct('date', date '1970-06-01'), map('dateFormat', 'LLLL', 'locale', 'FR'))").show()
#+---------------------------------------------+
#|to_csv(named_struct(date, DATE '1970-06-01'))|
#+---------------------------------------------+
#|                                         juin|
#+---------------------------------------------+

But I cannot find a way to make date_format to accept another locale. How can these functionalities be joined to make the following result?

+----------+-----+
|        c1|month|
+----------+-----+
|2021-05-01|  mai|
|2021-06-01| juin|
+----------+-----+

Solution

  • Thanks to this clever guy, this is a very nice solution to return results in another language (locale):

    df = df.withColumn('month', F.to_csv(F.struct('c1'), {'dateFormat': 'LLLL', 'locale': 'fr'}))
    
    df.show()
    #+----------+-----+
    #|        c1|month|
    #+----------+-----+
    #|2021-05-01|  mai|
    #|2021-06-01| juin|
    #+----------+-----+