Search code examples

Copy the schema from one dataframe to another

I have a Spark data frame (df1) with a particular schema, and I have another dataframe with the same columns, but different schema. I know how to do it column by column, but since I have a large set of columns, it would be quite lengthy. To keep the schema consistent across dataframes, I was wondering if I could be able to apply one schema to another data frame or creating a function that do the job.

Here is an example:

# root
#  |-- A: date (nullable = true)
#  |-- B: integer (nullable = true)
#  |-- C: string (nullable = true)

# root
#  |-- A: string (nullable = true)
#  |-- B: string (nullable = true)
#  |-- C: string (nullable = true)`

I want to copy apply the schema of df1 to df2.

I tried this approach for one column. Given that I have a large number of columns, it would be quite a lengthy way to do it.

df2 = df2.withColumn("B", df2["B"].cast('int'))


  • Yes, its possible dynamically with dataframe.schema.fields*[(col( for x in df1.schema.fields])


    from pyspark.sql.functions import *
    df1 = spark.createDataFrame([('2022-02-02',2,'a')],['A','B','C']).withColumn("A",to_date(col("A")))
    print("df1 Schema")
    #df1 Schema
    # |-- A: date (nullable = true)
    # |-- B: long (nullable = true)
    # |-- C: string (nullable = true)
    df2 = spark.createDataFrame([('2022-02-02','2','a')],['A','B','C'])
    print("df2 Schema")
    #df2 Schema
    # |-- A: string (nullable = true)
    # |-- B: string (nullable = true)
    # |-- C: string (nullable = true)
    #casting the df2 columns by getting df1 schema using select clause
    df3 =*[(col( for x in df1.schema.fields]),False)
    print("df3 Schema")
    #|A         |B  |C  |
    #|2022-02-02|2  |a  |
    #df3 Schema
    # |-- A: date (nullable = true)
    # |-- B: long (nullable = true)
    # |-- C: string (nullable = true)

    In this example I have df1 defined with Integer,date,long types.

    df2 is defined with string type.

    df3 is defined by using df2 as source data and attached df1 schema.