apache-spark pyspark apache-spark-sql timestamp dayofweek

Change Spark day timestamp from name to number

I'm trying to get the day of the week in int format from it saying Mon, Tues, Wed, etc.

I'm using this code right now:

from pyspark.sql.functions import date_format
df_ex1 = df.withColumn("day", date_format('Timestamp', 'E'))

Output:

+-------------------+----+----+----+-----+------------+-----------------+--------------+----+-----+---+
|Timestamp          |Open|High|Low |Close|Volume_(BTC)|Volume_(Currency)|Weighted_Price|Year|Month|day|
+-------------------+----+----+----+-----+------------+-----------------+--------------+----+-----+---+
|2011-12-31 09:52:00|4.39|4.39|4.39|4.39 |0.45558086  |2.0              |4.39          |2011|12   |Sat|
|2011-12-31 09:53:00|NaN |NaN |NaN |NaN  |NaN         |NaN              |NaN           |2011|12   |Sat|
|2011-12-31 09:54:00|NaN |NaN |NaN |NaN  |NaN         |NaN              |NaN           |2011|12   |Sat|
|2011-12-31 09:55:00|NaN |NaN |NaN |NaN  |NaN         |NaN              |NaN           |2011|12   |Sat|
|2011-12-31 09:56:00|NaN |NaN |NaN |NaN  |NaN         |NaN              |NaN           |2011|12   |Sat|
+-------------------+----+----+----+-----+------------+-----------------+--------------+----+-----+---+
only showing top 5 rows

Solution

There are two Spark native ways: dayofweek and weekday (the latter - only SQL).

from pyspark.sql import functions as F

df = spark.createDataFrame(
    [('2022-08-01', 'Mon'),
     ('2022-08-02', 'Tue'),
     ('2022-08-03', 'Wed'),
     ('2022-08-04', 'Thu'),
     ('2022-08-05', 'Fri'),
     ('2022-08-06', 'Sat'),
     ('2022-08-07', 'Sun')],
    ['Timestamp', 'day'])

df = df.withColumn('day_number_1', F.dayofweek('Timestamp'))
df = df.withColumn('day_number_2', F.expr("weekday(Timestamp)"))

df.show()
# +----------+---+------------+------------+
# | Timestamp|day|day_number_1|day_number_2|
# +----------+---+------------+------------+
# |2022-08-01|Mon|           2|           0|
# |2022-08-02|Tue|           3|           1|
# |2022-08-03|Wed|           4|           2|
# |2022-08-04|Thu|           5|           3|
# |2022-08-05|Fri|           6|           4|
# |2022-08-06|Sat|           7|           5|
# |2022-08-07|Sun|           1|           6|
# +----------+---+------------+------------+