Search code examples
sqlhivesql-order-bywindow-functions

Hive-- For Duplicate Order By Values, Will the Result Always be the Same?


I know theoretically the answer is random, but I was wondering if you doing for example window functions with row_number() and you have duplicate values in your order by column for a given partition, will the result still be the same? Does Hive look at other columns to determine ordering even if not specified?


Solution

  • The order for duplicate rows is not guaranteed because query processing is being done in parallel in many mappers and reducers, each may execute faster of slower, not always the same, depending on cluster and each node involved load. Mapper's results may not be processed in the same order even on the single reducer.