Search code examples
scalaapache-sparkapache-spark-sql

How to add a nested column to a DataFrame


I have a dataframe df with the following schema:

root
 |-- city_name: string (nullable = true)
 |-- person: struct (nullable = true)
 |    |-- age: long (nullable = true)
 |    |-- name: string (nullable = true)

What I want to do is add a nested column, say car_brand to my person structure. How would I do it?

The expected final schema would look like this:

root
 |-- city_name: string (nullable = true)
 |-- person: struct (nullable = true)
 |    |-- age: long (nullable = true)
 |    |-- name: string (nullable = true)
 |    |-- car_brand: string (nullable = true)

Solution

  • You can unpack the struct and add it to a new one, including the new column at the same time. For example, adding "bmw" to all persons in the dataframe be done like this:

    df.withColumn("person", struct($"person.*", lit("bmw").as("car_brand")))