How to find the max value in a column in pyspark dataframe

I have a pyspark dataframe

deviceId    timestamp
009eeb      2024-04-22
009eeb      2024-04-24
7c002v      2024-04-20
7c002v      null
4fd556      null
4fd556      null

I need to get the max timestamp in the final dataframe and drop duplicates. I tried with the below code

w = Window.partitionBy('deviceId')
df_max = df.withColumn('max_col', F.max('timestamp').over(w))\
            .where((F.col('timestamp') == F.col('max_col')) | (F.col('timestamp').isNull()))            
            .dropDuplicates()

But from this code, I'm getting what I need to get.

deviceId    timestamp
009eeb      2024-04-24
7c002v      null
4fd556      null
4fd556      null

I need to get the result as below

deviceId    timestamp
009eeb      2024-04-24
7c002v      2024-04-20
4fd556      null

Please suggest any changes to be done

Solution

Just use the max function directly.

from pyspark.sql import SparkSession, functions as F

spark = SparkSession.builder.enableHiveSupport().getOrCreate()

data = [
    ('009eeb', '2024-04-22'),
    ('009eeb', '2024-04-24'),
    ('7c002v', '2024-04-20'),
    ('7c002v', None),
    ('4fd556', None),
    ('4fd556', None),
]
df = spark.createDataFrame(data, ['deviceId', 'timestamp'])
df = df.groupBy('deviceId').agg(F.max('timestamp').alias('timestamp'))
df.show(truncate=False)

# +--------+----------+
# |deviceId|timestamp |
# +--------+----------+
# |009eeb  |2024-04-24|
# |4fd556  |null      |
# |7c002v  |2024-04-20|
# +--------+----------+