Search code examples
sqlstatisticsteradata

How to calculate the Z-Score for a specific dimension in SQL?


Good afternoon! I am trying to write a query using SQL in Teradata to show me the Z - score for each year in my data. I thought this should be a pretty simple problem, but I keep getting unexpected outputs or errors with each variation I try, and I'm not sure exactly what I'm doing wrong. I found a few similar posts on here, but I wasn't able to get them working for my specific situation. I'm still learning SQL so please bear with me and let me know if you need any more information. Thanks in advance for any assistance!

Formula for Zscore:

z = (x-μ)/σ, where x is the raw score, μ is the population mean, and σ is the population standard deviation

My data is structured like this: data structure But has a few more columns with customer info in them but for the purposes of this report I won't be using any of those and only care about the order year, not any of the other date info.

Order_Year Order_Count Zscore
2023 1443553 -2.175361497
2019 11298753 0.313237462
2022 13058147 0.757513182
2020 10673440 0.155335716
2021 12912660 0.720775338
2018 10963180 0.228499799
select 

OrderInfo.Order_Year

,sum(OrderInfo.Order_Count) as Order_Cnt

,(sum(OrderInfo.Order_Count) - AVG(sum(OrderInfo.Order_Count)) OVER (PARTITION BY OrderInfo.Order_Year)) /
    STDDEV_samp(sum(OrderInfo.Order_Count)) OVER (PARTITION BY OrderInfo.Order_Year) as zscore    
    
from ENTERPRISE_STATS.OrderInfo

group by OrderInfo.Order_Year

order by OrderInfo.Order_Year

; 
Order_Year Order_Cnt zscore
2018 10963180
2019 11298753
2020 10673440
2021 12912660
2022 13058147
2023 1443553

I tried a couple variations of this and I think it has something to with the sums, but if I remove any of those, it gives me this error:

Executed as Single statement. Failed [3504 : HY000] Selected non-aggregate values must be part of the associated group. Elapsed time = 00:00:00.047

STATEMENT 1: Select Statement failed.


Solution

  • WITH Order_Cnt AS(
    
    SELECT
    OrderInfo.Order_Year
    ,SUM(OrderInfo.Order_Count) AS Order_Cnt
    FROM ENTERPRISE_STATS.OrderInfo
    GROUP BY OrderInfo.Order_Year
    ORDER BY OrderInfo.Order_Year
    
    )
    
    SELECT
    OrderInfo.Order_Year, Order_Cnt,
    CAST((Order_Cnt - AVG(Order_Cnt) over())/
    STDDEV(Order_Cnt) over () as decimal(10,5)) AS Zscore
    FROM Order_Cnt