Search code examples
pyspark

How to define a pyspark schema with an array


I have a pyspark dataframe with an infered schema that looks like the below. How would I define this schema in pyspark?

root
 |-- active: string (nullable = true)
 |-- activeText: string (nullable = true)
 |-- addOns: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- addOnID: string (nullable = true)
 |    |    |-- amount: string (nullable = true)
 |    |    |-- category: string (nullable = true)
 |    |    |-- code: string (nullable = true)
 |    |    |-- creditTo: string (nullable = true)
 |    |    |-- description: string (nullable = true)
 |    |    |-- productID: string (nullable = true)
 |    |    |-- quantity: string (nullable = true)
 |    |    |-- subscriptionID: string (nullable = true)
 |    |    |-- taxable: string (nullable = true)
 |-- addedBy: string (nullable = true)


I got this far but wasn't sure how to deal with the array.

schema = StructType(
    [
        StructField("active", IntegerType(), True),
        StructField("activeText", StringType(), True),
        ...
        StructField("addedBy", IntegerType(), True),
    ]

Thanks!


Solution

  • you can check ArrayType with examples.

    For you case:

    schema = StructType([
        StructField("active", StringType(), True),
        StructField("activeText", StringType(), True),
        StructField("addOns", ArrayType(StructType([
            StructField("addOnID", StringType(), True),
            StructField("amount", StringType(), True),
            StructField("category", StringType(), True),
            StructField("code", StringType(), True),
            StructField("creditTo", StringType(), True),
            StructField("description", StringType(), True),
            StructField("productID", StringType(), True),
            StructField("quantity", StringType(), True),
            StructField("subscriptionID", StringType(), True),
            StructField("taxable", StringType(), True)
        ]), True)),
        StructField("addedBy", StringType(), True)
    ])