Search code examples
pivotdatabricksazure-databricksunpivot

Pivot Unpivot in Databricks


My question is does pivot/Unpivot work in Azure Databricks Data Engineering workspace?

I can see it works perfectly in SQL workspace.

Is there any limitations/ difference in working a set functions in these two workspaces in Databricks?


Solution

  • Data Engineering Workspace Regarding Pivot and Unpivot you can achieve similar functionality by using other Spark transformations and functions available in the Data Engineering workspace. For example, you can use the groupBy and pivot functions and Unpivot-like functionality.

    I agree with you if you are using the SQL workspace in Azure Databricks, you have access to SQL and can use the Pivot and Unpivot operations as part of your SQL queries. These operations allow you to transform your data by rotating rows into columns (Pivot) or the opposite, converting columns into rows (Unpivot).

    In Azure Databricks, the functionality of the Pivot/Unpivot operations can vary depending on the workspace The SQL workspace and the Data Engineering workspace have different syntax

    For your reference I have created 2 data frames to perform the Pivot and unpivot functions. Create SparkSession

    from pyspark.sql import SparkSession
    spark = SparkSession.builder.getOrCreate()
    

    Create sample data frame 1

    data1 = [("Alice", 25), ("Bob", 30), ("Charlie", 35)]
    df1 = spark.createDataFrame(data1, ["Name", "Age"])
    

    Create sample data frame 2

    data2 = [("Dave", 40), ("Eve", 45), ("Frank", 50)]
    df2 = spark.createDataFrame(data2, ["Name", "Age"])
    

    enter image description here

    df1.display()
    

    enter image description here

    df2.display()
    

    enter image description here

    Pivot df1

    pivoted_df1 = df1.groupBy("Name").pivot("Age").count()
    

    Display the pivoted data frame

    pivoted_df1.show()
    
    
    pivoted_df1.display()
    

    enter image description here

    enter image description here

    from pyspark.sql.functions import expr
    

    Unpivot df1

    unpivoted_df1 = df1.selectExpr("Name", "stack(2, 'Age', cast(Age as string)) as AttributeValue") \
                       .select("Name", expr("split(AttributeValue, ',')[0]").alias("Attribute"),
                               expr("split(AttributeValue, ',')[1]").alias("Value")
    

    )

                          unpivoted_df1.display()
    

    enter image description here enter image description here

    Unpivot df2

    unpivoted_df2 = df2.selectExpr("Name", "stack(2, 'Age', cast(Age as string)) as AttributeValue") \
                       .select("Name", expr("split(AttributeValue, ',')[0]").alias("Attribute"),
                               expr("split(AttributeValue, ',')[1]").alias("Value"))
                               
    
    unpivoted_df2.display()
    

    enter image description here

    To summarize, while Pivot/Unpivot operations are readily available in the SQL workspace of Azure Databricks, In the Data Engineering workspace, you may need to utilize Pyspark