From a single row dataframe how to create a new dataframe containing list of column names and their values

Given df1:

c1	c2	c3	c4	c5	c6	c7	c8
45	15	100	68	96	86	35	48

How to create df2:

columnName	ColumnValues
c1	45
c2	15
c3	100
c4	68
c5	96
c6	86
c7	35
c8	48

Question: Using pyspark, if we are given dataframe df1 (shown above), how can we create a dataframe df2 that contains the column names of df1 in the first column and the values of df1 in the second second column?

REMARKS: Please note that df1 will be dynamic, it will change based on the data loaded to it. As shown below, I already know how to do it if df1 is static:

data = [['c1', 45], ['c2', 15], ['c3', 100]]
mycolumns = ["myCol1","myCol2"]
df = spark.createDataFrame(data, mycolumns)
df.show()

For a static df1, the above code will show df2 as:

|myCol1|myCol2|
|---|---|
|c1|45|
|c2|15|
|c3|100|

Solution

Try this -

from pyspark.sql.functions import *

data = [["java", "dbms", "python"]]
columns = ["Subject 1", "Subject 2", "Subject 3"]

df1 = spark.createDataFrame(data, columns)

column_names = df1.columns

stacked = df1.selectExpr("stack(" + str(len(column_names)) + "," + ",".join([f"'{col}', `{col}`" for col in column_names]) + ") as (columnName, columnValue)")

df2 = stacked.selectExpr("columnName", "columnValue")
df2.show()

+----------+----------------+
|columnName|     columnValue|
+----------+----------------+
| Subject 1|            java|
| Subject 2|            dbms|
| Subject 3|          python|
+----------+----------------+