Groupby with Pyspark through filters

I have a df derived from clustering that looks like this:

Cluster	Variable 1	Variable 2
0	334	32
0	0	45
3	453	0
3	320	0
0	0	28
1	467	49
3	324	16
1	58	2

And i'm trying to achive the next result for each cluster and every variable:

Variable 1

Cluster	%of0	%ofvals != 0	Count of vals != 0	Sum of values	%universe
0	67	33	1	334	17
1	0	100	2	525	27
3	0	100	3	1097	56

Variable 2

Cluster	%of0	%ofvals != 0	Count of vals != 0	Sum of values	%universe
0	0	100	0	105	61
1	0	100	0	51	29
3	67	33	1	16	10

Note: % universe is the total sum of values of every variable, in this case for variable 1 would be: 334 + 525 + 1097 = 1956 (this is 100% so 334 its 17% of this total).

I'm in the process of learning Pyspark and I'm struggling with the syntax, this is the code I'm trying but i'm at loss because I don´t know how to manage the filterings to iterate for variable and for cluster:

for i in list_of_variables:
   print(i)
   df.groupBy('Cluster').agg((count((col(i) == 0) / df.filter(col('Cluster') == 0).count()) * 100).alias('% of 0'), (count((col(i) != 0) / df.filter(col('Cluster') == 0).count() * 100).alias('% of vals diff than 0')..

I would be very grateful for any ideas that could give me light on how to materialize this objective. Have an awesome day!

Solution

Maybe you could try with something like this to obtain the part of counts:

for i in list:
   print(i)
   output = df.filter(col(i) != 0).groupBy(col('Cluster')).agg(
    count(col('*')).alias('Count_vals_dif_0')).show()