Search code examples
sumdatabricksalias

Databricks Question - Column alias after groupBy() with use of sum() SQL function to perform summary aggregation


I'm trying to build a code in a Databrics Pyton notebook to do this calculation; sum(Percentage_Length) / sum(Percentage_Length) / AverageSpeed

After executing the code The expected result should have the column headings as flows

Id:integer
date:date
Hour:integer
QtrHour:integer
Time:string
Source:string
Result:double

However I'm not too sure how to write the code correctly to get the last column heading "Result"

The code I wrote is;

Agg_Result = Suna_link_join.groupBy("Id","date", "Hour", "QtrHour", "Time", "Source").agg(
sum(col("Percentage_Length").alias("V1"))/(sum(col("Percentage_Length").alias("V2"))/sum(col("AvgSpeed") * col("Percentage_Length")).alias("AverageSpeed") ).alias("Result"))

But once I executed, it result the column headings as follows. What is the reason to show the whole function as the last column heading instead of the name "Result"???????

Id:integer,date:date,Hour:integer,QtrHour:integer,Time:string,Source:string,(sum(Percentage_Length AS `V1`) / (sum(Percentage_Length AS `V2`) / sum((AvgSpeed * Percentage_Length)) AS `AverageSpeed`) AS `Result`):double

Solution

  • When you .alias("Result") you are only applying the alias to the right-side sum(...) on the aggregation.

    Wrap the entire aggregation in parenthesis to ensure you're aliasing the final expression, not a partial expression, like so:

    .agg(
      (sum(col("Percentage_Length").alias("V1")) / (sum(col("Percentage_Length").alias("V2"))/sum(col("AvgSpeed") * col("Percentage_Length")).alias("AverageSpeed") )).alias("Result")
    # ^ relevant parenthesis
    )