python-3.x pyspark bigdata row apache-spark-sql

pyspark sql: how to count the row with mutiple conditions

I have a dataframe like this after some operations;

df_new_1 = df_old.filter(df_old["col1"] >= df_old["col2"])
df_new_2 = df_old.filter(df_old["col1"] < df_old["col2"])

print(df_new_1.count(), df_new_2.count())
>> 10, 15

I can find the number of rows individually like above by calling count(). But how can I do this using pyspark sql row operation. i.e aggregating by row. I want to see the result like this;

Row(check1=10, check2=15)

Solution

Since you tagged pyspark-sql, you can do the following:

df_old.createOrReplaceTempView("df_table")

spark.sql("""

    SELECT sum(int(col1 >= col2)) as check1
    ,      sum(int(col1 < col2)) as check2
    FROM df_table

""").collect()

Or use the API functions:

from pyspark.sql.functions import expr

df_old.agg(
    expr("sum(int(col1 >= col2)) as check1"), 
    expr("sum(int(col1 < col2)) as check2")
).collect()

{"code":-1102,"msg":"Mandatory parameter 'timestamp' was not sent, was empty/null, or malformed."}
Python dataclass from a nested dict
Make an batch file that runs commando in WinPython Command Prompt
queryset objects in which user has an object
How to Configure AWS SES with Flask-Security-Too for Email Sending? (ValueError: No email extension configured)
Speeding up an .exe created with Pyinstaller
Parse mathematical expressions with pyparsing
Datetime "fromtimestamp()" ignores inheritance if timezone is not None
Anaconda export Environment file
fromtimestamp returns different results
How can I prevent the Anti-virus from detecting my app as a virus or malware when another user tries to install it?
getting current <select> value from drop-down menu with Python Selenium
OpenAI Assistants API error: "'Assistants' has no attribute 'files'"
VS Code not finding pytest tests
only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
ERROR: Could not build wheels for tgcrypto, which is required to install pyproject.toml-based projects
How to tackle time limit exceeded error in leetcode
Google Cloud Functions - ImportError: cannot import name 'InvalidKeyError' from 'jwt.exceptions'
Pip - Fatal error in launcher: Unable to create process using '"'
Generate all n choose k binary vectors python
Keyringrc.cfg Permission Issues when installing packages via a python script using poetry
`del` statement and free variables
Python 3: Multiply a vector by a matrix without NumPy
Airflow Task Group Execution Order
Why integers do not have size limit in python 3?
Calculate with two QSliders values using PyQt5
Python 3 superclass instantiation via derived class's default constructor
How To Choose Values From Tensor Using Another Tensor In Tensorflow
How to decorate console logger messages in Python?
Python loop on list