I have a dataframe with the following schema :
col1 as col1, col 2, col 3, . . col n
I have another dataframe with the following schema
col1 as diffName, col n+1, col n+2 . . .
how can i append values in col 1 of dataframe and the rest of the columns as null values?
i have tried using union as well as merge but to no avail
You could manually make the columns of both dataframes match, then union them. For example, with these dataframes:
df1 = spark.createDataFrame([
(1, 2),
(3, 4)
], ['a', 'b'])
df2 = spark.createDataFrame([
(5, 6, 7),
(8, 9, 10)
], ['a2', 'c', 'd'])
df2 = df2.withColumnRenamed('a2', 'a')
The first step would be to make sure the column that is common to both dataframes has the same name:
df2 = df2.withColumnRenamed('a2', 'a')
Then you can make the columns match:
for c in df2.columns:
if c not in df1.columns:
df1 = df1.withColumn(c, lit(None))
for c in df1.columns:
if c not in df2.columns:
df2 = df2.withColumn(c, lit(None))
Finally, you can take the union. I find unionByName
to be safer:
df_all = df1.unionByName(df2)