The following dataframe will produce values 0 to 3.
df = DeltaTable.forPath(spark, '/mnt/lake/BASE/SQLClassification/cdcTest/dbo/cdcmergetest/1').history().select(col("version"))
Can someone show me how to modify the dataframe such that it only provides the maximum value i.e 3?
I have tried
df.select("*").max("version")
And
df.max("version")
But no luck
Any thoughts?
Use Max function, This should work:
df.select(F.max("version").alias("max_version")).show()
or
df.agg(F.max("version").alias("max_version")).show()
Input:
+-------+
|version|
+-------+
| 0|
| 1|
| 3|
| 2|
+-------+
Output:
+-----------+
|max_version|
+-----------+
| 3|
+-----------+