Search code examples
snowflake-cloud-data-platformsnowpark

Why does a select on an ordered DataFrame in Snowpark destroys ordering?


In Snowpark (Python API, version 0.11.0), I try to order an Dataframe according to an attribute COUNT_OBJ, then show the top 5 EVENTDATES. I realized that the subsequent "select" destroys the ordering of the Dataframe. Is that to be expected? enter image description here

As a longterm spark developer, this is an unexpected behavior

EDIT: More output as requested in comments:

enter image description here


Solution

  • As pointed out by @Sergui in the comments, the problem ist the "limit" in the generated SQL (see https://community.snowflake.com/s/article/SELECT-query-with-LIMIT-clause-returns-non-deterministic-result-if-ORDER-BY-clause-exists-in-different-level)

    enter image description here

    In my case, the solution is to swap the "limit" before the select like this:

    In this case, the ordering remains