Here is my table showing all the columns. What I want to basically happen is to be able to take the closing_price column and calculate a at least the moving standard deviation of about 7 days
+------+---------------+-------------+-------------+-------------+------------+-------------+------------+-------------------+----------------+----------------+------------------+-------------------+-------------------+
|volume|volume_weighted|opening_price|closing_price|highest_price|lowest_price| timestamp|transactions| new_datetime|moving_max_close|moving_min_close| moving_avg_close| range_close|moving_median_close|
+------+---------------+-------------+-------------+-------------+------------+-------------+------------+-------------------+----------------+----------------+------------------+-------------------+-------------------+
| 1| 55.388| 55.388| 55.388| 55.388| 55.388|1704067200000| 1|2024-01-01 08:00:00| 55.388| 55.388| 55.388| 0.0| 55.388|
| 1| 55.388| 55.388| 55.388| 55.388| 55.388|1704139200000| 1|2024-01-02 04:00:00| 55.388| 55.388| 55.388| 0.0| 55.388|
| 533| 55.5054| 55.405| 55.573| 55.586| 55.242|1704153600000| 533|2024-01-02 08:00:00| 55.573| 55.388|55.449666666666666|0.18500000000000227| 55.388|
| 473| 55.5836| 55.58| 55.672| 55.685| 55.445|1704168000000| 473|2024-01-02 12:00:00| 55.672| 55.388| 55.50525| 0.2839999999999989| 55.4805|
| 942| 55.6157| 55.679| 55.618| 55.683| 55.505|1704182400000| 942|2024-01-02 16:00:00| 55.672| 55.388| 55.5278| 0.2839999999999989| 55.573|
| 662| 55.6114| 55.619| 55.592| 55.662| 55.48|1704196800000| 662|2024-01-02 20:00:00| 55.672| 55.388| 55.5385| 0.2839999999999989| 55.582499999999996|
| 155| 55.5755| 55.589| 55.583| 55.6| 55.48|1704211200000| 155|2024-01-03 00:00:00| 55.672| 55.388| 55.57099999999999| 0.2839999999999989| 55.5875|
| 52| 55.5663| 55.588| 55.563| 55.6| 55.48|1704225600000| 52|2024-01-03 04:00:00| 55.672| 55.563| 55.60016666666667|0.10899999999999466| 55.5875|
| 504| 55.7318| 55.564| 55.681| 55.805| 55.55|1704240000000| 504|2024-01-03 08:00:00| 55.681| 55.563| 55.61816666666667| 0.117999999999995| 55.605000000000004|
| 798| 55.6446| 55.679| 55.583| 55.739| 55.48|1704254400000| 798|2024-01-03 12:00:00| 55.681| 55.563| 55.60333333333333| 0.117999999999995| 55.5875|
| 1012| 55.6643| 55.558| 55.751| 55.825| 54.92|1704268800000| 1012|2024-01-03 16:00:00| 55.751| 55.563| 55.62549999999999|0.18799999999999528| 55.5875|
| 684| 55.7647| 55.754| 55.802| 55.845| 55.593|1704283200000| 684|2024-01-03 20:00:00| 55.802| 55.563|55.660500000000006|0.23899999999999721| 55.632|
| 547| 55.7301| 55.815| 55.738| 55.838| 55.6|1704297600000| 547|2024-01-04 00:00:00| 55.802| 55.563| 55.68633333333333|0.23899999999999721| 55.7095|
| 184| 55.7401| 55.739| 55.673| 55.775| 55.63|1704312000000| 184|2024-01-04 04:00:00| 55.802| 55.583| 55.70466666666666| 0.2190000000000012| 55.7095|
| 639| 55.6488| 55.653| 55.568| 55.75| 55.495|1704326400000| 639|2024-01-04 08:00:00| 55.802| 55.568|55.685833333333335|0.23400000000000176| 55.7055|
| 1134| 55.5318| 55.569| 55.525| 55.615| 55.365|1704340800000| 1134|2024-01-04 12:00:00| 55.802| 55.525| 55.67616666666666| 0.277000000000001| 55.7055|
| 723| 55.4923| 55.522| 55.536| 55.565| 53.779|1704355200000| 723|2024-01-04 16:00:00| 55.802| 55.525| 55.64033333333333| 0.277000000000001| 55.6205|
| 410| 55.5717| 55.531| 55.592| 55.615| 55.44|1704369600000| 410|2024-01-04 20:00:00| 55.738| 55.525| 55.60533333333333|0.21300000000000097| 55.58|
| 138| 55.5721| 55.589| 55.574| 55.589| 55.475|1704384000000| 138|2024-01-05 00:00:00| 55.673| 55.525| 55.578|0.14800000000000324| 55.571|
| 80| 55.5543| 55.572| 55.523| 55.6| 55.52|1704398400000| 80|2024-01-05 04:00:00| 55.592| 55.523|55.553000000000004|0.06899999999999551| 55.552|
+------+---------------+-------------+-------------+-------------+------------+-------------+------------+-------------------+----------------+----------------+------------------+-------------------+-------------------+
This is my code specifying the window I'll be setting for the .over()
method of the stddev()
function. However whenever I do calculate it, it shows the following values
from pyspark.sql.window import Window
from pyspark.sql.functions import skewness, kurtosis, stddev, avg
# ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW which would be a 7 day moving standard deviation
window = Window.orderBy("new_datetime").rowsBetween(5, Window.currentRow)
# window = Window.orderBy("timestamp").rangeBetween(-6, 0)
result.select(skewness(result.closing_price).over(window=window).alias('skew_close')).show()
What could be the reason why the stddev()
fails to calculate the moving standard deviation of the closing price values of over 7 days? Is it because of my window function? Am I missing something here?
+----------+
|skew_close|
+----------+
| NULL|
| NULL|
| NULL|
| NULL|
| NULL|
| NULL|
| NULL|
| NULL|
| NULL|
| NULL|
| NULL|
| NULL|
| NULL|
| NULL|
| NULL|
| NULL|
| NULL|
| NULL|
| NULL|
| NULL|
+----------+
Use negative numbers for the rows before the current row in rowsBetween. It's NULL because there are no rows between start=5 and end=0
window = Window.orderBy("new_datetime").rowsBetween(-5, Window.currentRow)