pivot databricks azure-databricks unpivot

Pivot Unpivot in Databricks

My question is does pivot/Unpivot work in Azure Databricks Data Engineering workspace?

I can see it works perfectly in SQL workspace.

Is there any limitations/ difference in working a set functions in these two workspaces in Databricks?

Solution

Data Engineering Workspace Regarding Pivot and Unpivot you can achieve similar functionality by using other Spark transformations and functions available in the Data Engineering workspace. For example, you can use the groupBy and pivot functions and Unpivot-like functionality.

I agree with you if you are using the SQL workspace in Azure Databricks, you have access to SQL and can use the Pivot and Unpivot operations as part of your SQL queries. These operations allow you to transform your data by rotating rows into columns (Pivot) or the opposite, converting columns into rows (Unpivot).

In Azure Databricks, the functionality of the Pivot/Unpivot operations can vary depending on the workspace The SQL workspace and the Data Engineering workspace have different syntax

For your reference I have created 2 data frames to perform the Pivot and unpivot functions. Create SparkSession

from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()

Create sample data frame 1

data1 = [("Alice", 25), ("Bob", 30), ("Charlie", 35)]
df1 = spark.createDataFrame(data1, ["Name", "Age"])

Create sample data frame 2

data2 = [("Dave", 40), ("Eve", 45), ("Frank", 50)]
df2 = spark.createDataFrame(data2, ["Name", "Age"])

enter image description here

df1.display()

enter image description here

df2.display()

enter image description here

Pivot df1

pivoted_df1 = df1.groupBy("Name").pivot("Age").count()

Display the pivoted data frame

pivoted_df1.show()


pivoted_df1.display()

enter image description here

from pyspark.sql.functions import expr

Unpivot df1

unpivoted_df1 = df1.selectExpr("Name", "stack(2, 'Age', cast(Age as string)) as AttributeValue") \
                   .select("Name", expr("split(AttributeValue, ',')[0]").alias("Attribute"),
                           expr("split(AttributeValue, ',')[1]").alias("Value")

)

                      unpivoted_df1.display()

enter image description here

Unpivot df2

unpivoted_df2 = df2.selectExpr("Name", "stack(2, 'Age', cast(Age as string)) as AttributeValue") \
                   .select("Name", expr("split(AttributeValue, ',')[0]").alias("Attribute"),
                           expr("split(AttributeValue, ',')[1]").alias("Value"))
                           

unpivoted_df2.display()

enter image description here

To summarize, while Pivot/Unpivot operations are readily available in the SQL workspace of Azure Databricks, In the Data Engineering workspace, you may need to utilize Pyspark