Hi there I want to achieve something like this
SAS SQL: select * from flightData2015 group by DEST_COUNTRY_NAME order by count
This is my spark code:
flightData2015.selectExpr("*").groupBy("DEST_COUNTRY_NAME").orderBy("count").show()
I received this error:
AttributeError: 'GroupedData' object has no attribute 'orderBy'. I am new to pyspark. Pyspark's groupby and orderby are not the same as SAS SQL?
I also try sortflightData2015.selectExpr("*").groupBy("DEST_COUNTRY_NAME").sort("count").show()
and I received kind of same error. "AttributeError: 'GroupedData' object has no attribute 'sort'"
Please help!
There is no need for group by if you want every row. You can order by multiple columns.
from pyspark.sql import functions as F
vals = [("United States", "Angola",13), ("United States","Anguilla" , 38), ("United States","Antigua", 20), ("United Kingdom", "Antigua", 22), ("United Kingdom","Peru", 50), ("United Kingdom", "Russisa",13), ("Argentina", "United Kingdom",13),]
cols = ["destination_country_name","origin_conutry_name", "count"]
df = spark.createDataFrame(vals, cols)
#display(df.orderBy(['destination_country_name', F.col('count').desc()])) If you want count to be descending
display(df.orderBy(['destination_country_name', 'count']))