python dataframe pyspark rowcount pyspark-schema

Pyspark: Adding row/column with single value of row counts

I have a pyspark dataframe that I'd like to get the row count for. Once I get the row count, I'd like to add it to the top left corner of the data frame, as shown below.

I've tried creating the row first and doing a union on the empty row and the dataframe, but the empty row gets overwritten. I've tried adding it as a literal in a column, but having trouble nulling the remainder of the column as well as the row. Any advice?

dataframe:

col1	col2	col3	...	col13
string	string	timest	...	int

for a few rows.

desired output:

row_count	col1	col2	col3	...	col13
numofrows
	string	string	timest	...	int

So the row count would sit where an otherwise empty row and empty column meet.

Solution

Assuming df is your dataframe:

from pyspark.sql import functions as F

cnt = df.count()

columns_list = df.columns

df = df.withColumn("row_count", F.lit(None).cast("int"))
schema = df.schema

cnt_line = spark.createDataFrame([[None for x in columns_list] + [cnt]], schema=schema)

df.unionAll(cnt_line).show()