Search code examples
dictionarypysparkexplode

How to explode multiple columns (which are dictionaries with the same key) of a pyspark dataframe into rows


The data frame has multiple columns in dictionary format - which have the same key. How can I explode them into rows without having to use any joins keeping the key from any of the columns?

The schema of the data frame is here The columns that need to be exploded are pct_ci_tr, pct_ci_rn, pct_ci_ttv and pct_ci_comm


Solution

  • I would do something like this :

    from pyspark.sql import functions as F
    
    df.select(
        "s__",
        F.expr("""
            stack(
                4,
                "pct_ci_tr",
                pct_ci_tr,
                "pct_ci_rn",
                pct_ci_rn,
                "pct_ci_ttv",
                pct_ci_ttv,
                "pct_ci_comm",
                pct_ci_comm,
            ) as (lib, map_values)"""
        ),
    ).select("s__", "lib", F.explode(F.col("map_values")))