Search code examples
apache-sparkpysparkapache-spark-sqldayofweekspark3

Start of the week on Monday in Spark


This is my dataset:

from pyspark.sql import SparkSession, functions as F
spark = SparkSession.builder.getOrCreate()

df = spark.createDataFrame([('2021-02-07',),('2021-02-08',)], ['date']) \
    .select(
        F.col('date').cast('date'),
        F.date_format('date', 'EEEE').alias('weekday'),
        F.dayofweek('date').alias('weekday_number')
    )
df.show()
#+----------+-------+--------------+
#|      date|weekday|weekday_number|
#+----------+-------+--------------+
#|2021-02-07| Sunday|             1|
#|2021-02-08| Monday|             2|
#+----------+-------+--------------+

dayofweek returns weekday numbers which start on Sunday.

Desired result:

+----------+-------+--------------+
|      date|weekday|weekday_number|
+----------+-------+--------------+
|2021-02-07| Sunday|             7|
|2021-02-08| Monday|             1|
+----------+-------+--------------+

Solution

  • F.expr('weekday(date) + 1')
    

    weekday

    from pyspark.sql import SparkSession, functions as F
    spark = SparkSession.builder.getOrCreate()
    
    df = spark.createDataFrame([('2021-02-07',),('2021-02-08',)], ['date']) \
        .select(
            F.col('date').cast('date'),
            F.date_format('date', 'EEEE').alias('weekday'),
            F.expr('weekday(date) + 1').alias('weekday_number'),
        )
    df.show()
    #+----------+-------+--------------+
    #|      date|weekday|weekday_number|
    #+----------+-------+--------------+
    #|2021-02-07| Sunday|             7|
    #|2021-02-08| Monday|             1|
    #+----------+-------+--------------+