Search code examples
sqldatabricksazure-databricksdatabricks-sql

in databricks does array_union maintain order


MERGE INTO final_table a
USING 
(
select student_id,array_agg(distinct subject) new_subject
from changes_table b
group by 1,2
) b
on a.student_id =  b.student_id
WHEN MATCHED THEN
  UPDATE SET a.subject = array_union(array(new_subject),array(subject)),
WHEN NOT MATCHED
  THEN INSERT (student_id,subject) VALUES (b.student_id,new_subject)
changed_table values = ["sql","python"] 
Final_table values = ["sql","scala"]

Result values ["sql", "python", "scala"]

The output that I am getting is correct.

Question is that would it maintain the array_union maintain order in databricks ?


Solution

  • array_union is a Spark function and we can see its implementation: here on GitHub.

    If you examine it, you will see how it's iterating through both arrays in sequence and adding unique entries to an array buffer, which becomes the final result. Therefore yes, it does preserve order.