AWS Glue does not give coherent result for pyspark - orderBy

when running pyspark locally I get correct results with list ordered by BOOK_ID, But when deploying the AWS Glue job, the books seem not to be ordered

root
 |-- AUTHORID: integer
 |-- NAME: string 
 |-- BOOK_LIST: array 
 |    |-- BOOK_ID: integer 
 |    |-- BOOK_NAME: string

    from pyspark.sql import functions as F
    
    result = (df_authors.join(df_books, on=["AUTHOR_ID"], how="left")
              .orderBy(F.col("BOOK_ID").desc())
              .groupBy("AUTHOR_ID", "NAME")
              .agg(F.collect_list(F.struct("BOOK_ID", "BOOK_NAME")))
              )

Note: I'm using pyspark 3.2.1 and Glue 2.0

Any suggestion please

Solution

Supposition

Although I managed to run the job on Glue 3.0 that supports spark 3.1, the orderBy still giving wrong result

Migrating from AWS Glue 2.0 to AWS Glue 3.0

The solution that seems to give a good result is to reduce the number of workers to 2 which is the minimum allowed number of workers

The explanation is: Glue jobs may have many workers that allow parallelism, thus the orderBy can't give a correct result in contrary where we have only one worker

Suggested Sollution

Use the minimum number of workers (which is not a pertinent solution)
Apply the .orderBy for each dataframe before the join
Or use .coalesce(1)

 result = (df_authors.join(df_books, on=["AUTHOR_ID"], how="left")
              .coalesce(1)
              .orderBy(F.col("BOOK_ID").desc())
              .groupBy("AUTHOR_ID", "NAME")
              .agg(F.collect_list(F.struct("BOOK_ID", "BOOK_NAME")))
              )

Which allow to get the right result but in this case we lose in performance