Search code examples
pythonapache-sparkpysparkapache-spark-sql

Add one more StructField to schema


My PySpark data frame has the following schema:

schema = spark_df.printSchema()
root
 |-- field_1: double (nullable = true)
 |-- field_2: double (nullable = true)
 |-- field_3 (nullable = true)
 |-- field_4: double (nullable = true)
 |-- field_5: double (nullable = true)
 |-- field_6: double (nullable = true)

I would like to add one more StructField to the schema, so the new schema would looks like:

root
 |-- field_1: double (nullable = true)
 |-- field_1: double (nullable = true)
 |-- field_2: double (nullable = true)
 |-- field_3 (nullable = true)
 |-- field_4: double (nullable = true)
 |-- field_5: double (nullable = true)
 |-- field_6: double (nullable = true)

I know I can manually create a new_schema like below:

new_schema = StructType([StructField("field_0", StringType(), True),
                            :
                         StructField("field_6", IntegerType(), True)])

This works for a small number of fields but couldn't generate if I have hundreds of fields. So I am wondering is there a more elegant way to add a new field to the beginning of the schema? Thanks!


Solution

  • You can copy existing fields and perpend:

    to_prepend = [StructField("field_0", StringType(), True)] 
    
    StructType(to_prepend + df.schema.fields)