python sorting pyspark group-by sql-order-by

pyspark groupBy and orderBy use together

Hi there I want to achieve something like this

SAS SQL: select * from flightData2015 group by DEST_COUNTRY_NAME order by count

My data looks like this:

This is my spark code:

flightData2015.selectExpr("*").groupBy("DEST_COUNTRY_NAME").orderBy("count").show()

I received this error:

AttributeError: 'GroupedData' object has no attribute 'orderBy'. I am new to pyspark. Pyspark's groupby and orderby are not the same as SAS SQL?

I also try sortflightData2015.selectExpr("*").groupBy("DEST_COUNTRY_NAME").sort("count").show()and I received kind of same error. "AttributeError: 'GroupedData' object has no attribute 'sort'" Please help!

Solution

There is no need for group by if you want every row. You can order by multiple columns.

from pyspark.sql import functions as F
vals = [("United States", "Angola",13), ("United States","Anguilla" , 38), ("United States","Antigua", 20), ("United Kingdom", "Antigua", 22), ("United Kingdom","Peru", 50), ("United Kingdom", "Russisa",13), ("Argentina", "United Kingdom",13),]
cols = ["destination_country_name","origin_conutry_name", "count"]



df = spark.createDataFrame(vals, cols)
#display(df.orderBy(['destination_country_name', F.col('count').desc()])) If you want count to be descending

display(df.orderBy(['destination_country_name', 'count']))