Using sum in withColumn pyspark / python

I need your help in a small issue, which I actually solved in a manual way, but I would like to get it "right"

Data:

customer_id	gender
abc	m
def	w

etc.

Now it gets aggreaget the following way:

gender_count = data.groupBy('gender').agg(F.countDistinct('customer_id').alias('amount'))

gender_count:

gender	amount
m	4
w	6

Now, I would like to creat a new column with the total number of customers in order to compute the share of the individual genders. However, I could not find a function that works with "withColumn". So what I do is to sum up the number of customers before and insert it as a literal value:

gender_count = gender_count.withColumn('total', F.lit(10)).withColumn('share', (F.col('amount') / F.col('total')))

Result:

gender	amount	total	share
m	4	10	0.4
w	6	10	0.6

Do you have any idea how i could replace the F.lit(10)? That would save me one manual step and a potential source of error.

Thank you!

Solution

You can use the count() on the dataframe.

total_items = data.count()

gender_count = gender_count.withColumn('total', F.lit(total_items)).withColumn('share', (F.col('amount') / F.col('total')))