Insert selective columns into pyspark dataframe

I have a dataframe with the following schema :

col1 as col1, col 2, col 3, . . col n

I have another dataframe with the following schema

col1 as diffName, col n+1, col n+2 . . .

how can i append values in col 1 of dataframe and the rest of the columns as null values?

i have tried using union as well as merge but to no avail

Solution

You could manually make the columns of both dataframes match, then union them. For example, with these dataframes:

df1 = spark.createDataFrame([
    (1, 2),
    (3, 4)
], ['a', 'b'])

df2 = spark.createDataFrame([
    (5, 6, 7),
    (8, 9, 10)
], ['a2', 'c', 'd'])

df2 = df2.withColumnRenamed('a2', 'a')

The first step would be to make sure the column that is common to both dataframes has the same name:

df2 = df2.withColumnRenamed('a2', 'a')

Then you can make the columns match:

for c in df2.columns:
    if c not in df1.columns:
        df1 = df1.withColumn(c, lit(None))

for c in df1.columns:
    if c not in df2.columns:
        df2 = df2.withColumn(c, lit(None))

Finally, you can take the union. I find unionByName to be safer:

df_all = df1.unionByName(df2)