python apache-spark pyspark apache-spark-sql databricks

How to query for the maximum / highest value in an field with PySpark

The following dataframe will produce values 0 to 3.

df = DeltaTable.forPath(spark, '/mnt/lake/BASE/SQLClassification/cdcTest/dbo/cdcmergetest/1').history().select(col("version"))

Can someone show me how to modify the dataframe such that it only provides the maximum value i.e 3?

I have tried

df.select("*").max("version")

And

df.max("version")

But no luck

Any thoughts?

Solution

Use Max function, This should work:

df.select(F.max("version").alias("max_version")).show()

df.agg(F.max("version").alias("max_version")).show()

Input:

+-------+
|version|
+-------+
|      0|
|      1|
|      3|
|      2|
+-------+

Output:

+-----------+
|max_version|
+-----------+
|          3|
+-----------+

Generate random number between x and y which is a multiple of 5
How to concatenate multiple pandas.DataFrames without running into MemoryError
In game loop, how do I restart game properly using nested class or loop?
How do I restart a program based on user input?
Moviepy - Output video not playable
Generate all possible Boolean cases from n Boolean Values
Is there something like for i in range(length) in PHP?
Missing sqlite3 after Python3 compile on Linux
How to run all pytest assertions, even if some of them fail?
Is it bad practice to use Recursion where it isn't necessary?
How to deal with SettingWithCopyWarning in Pandas
ROS1 catkin_make failed: catkin_install_python() called without required DESTINATION argument
select an entire row - shortcut keyboard
pandas.style.apply .to_excel is throwing keyerror
How to get a pandas dataframe from an apache log?
How do I format a date in Jinja2?
Get app version from pyproject.toml inside python code
Why am I getting "NameError: Module 'dashboard_permission' has no mapped classes registered under the name '_sa_instance_state'"?
What's the fastest way of checking if a point is inside a polygon in python
Is there a way to remove unused imports for Python in VS Code?
Matplotlib runs out of memory when plotting in a loop
How to create a subclass in python that is inherited from turtle Module
how to stop the import of a python module
ModuleNotFoundError: No module named 'tensorflow.contrib'; 'tensorflow' is not a package
How to get number of affected rows in sqlalchemy?
How to get Python division by -0.0 and 0.0 to result in -Inf and Inf, respectively?
Unable to load tensorflow model with pickle
How to plot a one to many function on matplotlib in python
Best way to replace multiple characters in a string?
Understanding scikit learn import variants