I have a spark data frame
df =
a b c d
0 12 12.0 car bike
1 20 20.5 car alto
2 15 12.0 bike car
3 25 25 bike jeep
I want to find the median of a column 'a'. I couldn't find an appropriate way to find the median, so used the normal python NumPy function to find the median but I was getting an error as below:-
import numpy as np
median = df['a'].median()
error:-
TypeError: 'Column' object is not callable
Expected output:-
17.5
You can use precentile_approx like this,
df.agg(F.expr("percentile_approx('a', 0.5)")).show()