I'm using the query below to get the start and end of the week. Although dayofweek is part of the pyspark.sql.functions in this format within a SQL query it doesn't seem to recognize the function. I tried concatenating the function still it did not work. My main issue is that one timestamp listed is depending on the start and end of the week range to perform correctly and output the values I'm looking.
somedataset.registerTempTable("mytable")
spark.sql("SELECT DISTINCT max(col1) AS col1_edited, \
col2, col3, max(col4) AS col4_edited,\
min(MyTimestamp1) as MyTimestamp1_edited,\
date_sub(CAST(timestamp2 AS date), dayofweek-1) as week_start,\
date_sub(CAST(timestamp3 AS date), 7-dayofweek) as week_end,\
FROM mytabble \
GROUP BY col1_edited, col2, col3, col4, MyTimestamp1,\
timestamp2,\
timestamp3").registerTempTable("mineable")
I'm using Pyspark 3.1.1+amzn.0 any help will be greatly appreciated. the output comes back it should look something like this:
UPDATE: based on the comment below this was the small fix (everything else remained the same.
date_sub(CAST(timestamp2 AS date), dayofweek(timestamp2)-1) as week_start,\
date_sub(CAST(timestamp3 AS date), 7-dayofweek(timestamp3)) as week_end,\
DAYOFWEEK
should be called with a column as an argument:
from datetime import date
_data = [
(date(2023, 10, 13), ),
(date(2023, 10, 16), ),
(date(2023, 10, 28), ),
]
_schema = ['my_date']
df = spark.createDataFrame(_data, _schema)
df.createOrReplaceTempView('temp')
q = """
SELECT
my_date,
DATEADD(my_date, 1 - DAYOFWEEK(my_date)) AS WeekStart,
DATEADD(my_date, 7 - DAYOFWEEK(my_date)) AS WeekEnd
FROM
temp
"""
res = spark.sql(q)
res.show()
# +----------+----------+----------+
# | my_date| WeekStart| WeekEnd|
# +----------+----------+----------+
# |2023-10-13|2023-10-08|2023-10-14|
# |2023-10-16|2023-10-15|2023-10-21|
# |2023-10-28|2023-10-22|2023-10-28|
# +----------+----------+----------+