PySpark - Sum a column in dataframe and return results as int

I have a pyspark dataframe with a column of numbers. I need to sum that column and then have the result return as an int in a python variable.

df = spark.createDataFrame([("A", 20), ("B", 30), ("D", 80)],["Letter", "Number"])

I do the following to sum the column.

df.groupBy().sum()

But I get a dataframe back.

+-----------+
|sum(Number)|
+-----------+
|        130|
+-----------+

I would 130 returned as an int stored in a variable to be used else where in the program.

result = 130

Solution

The simplest way really :

df.groupBy().sum().collect()

But it is very slow operation: Avoid groupByKey, you should use RDD and reduceByKey:

df.rdd.map(lambda x: (1,x[1])).reduceByKey(lambda x,y: x + y).collect()[0][1]

I tried on a bigger dataset and i measured the processing time:

RDD and ReduceByKey : 2.23 s

GroupByKey: 30.5 s

Sympy - split polynomial into two parts, positive and negative
Transform code to list perfect numbers from for loop to while loops
Command to uninitialize a Git repo in Windows
Do I need to use scaler even if my dataframe has fairly normalized data within a specific range
RuleBasedCollator rule ignored
How can I make a virtual environment work with pyenv?
Python split() function :: Need to split "int_32\n' " so that I get int_32 alone
Background threads stoping
Pair data located in the same string, AWK or other
how to insert 64 bytes FF to a flash file every 2048 bytes with python
C# to Python RSA implement
How to display additional count near progress bar in Enquiry Screen?
How can i fix this without any external librarys?
Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas
resize with averaging or rebin a numpy 2d array
What does the "yield" keyword do in Python?
MissingGreenlet: greenlet_spawn has not been called
Python, want logging with log rotation and compression
Understanding and Fixing the regex?
Does Python make a copy of objects on assignment?
Alternative to .concat() of empty dataframe, now that it is being deprecated?
QML ListView sections from the code
Save MS ACCESS attachments with python
How to make Pareto chart in python?
Format string with custom delimiters
How to return the fractional part of a number?
How to specify conda env in Python Debugger in VScode
How do I type the `__prepare__` method for a metaclass?
Referencing row values in pyodbc when column name contains dashes (hyphens)
Figure out if a business name is very similar to another one - Python