Get the most common element of an array using Pyspark

How can I get the most common element of an array after concatenating two columns using Pyspark

df = spark.createDataFrame([
  [['a','a','b'],['a']],
  [['c','d','d'],['']],
  [['e'],['e','f']],
  [[''],['']]
]).toDF("arr_1","arr2")

df_new = df.withColumn('arr',F.concat(F.col('arr_1'),F.col('arr_2'))

expected output:

+------------------------+
| arr  | arr_1   | arr_2 |
+------------------------+
| [a]  | [a,a,b] | [a]   |
| [d]  | [c,d,d] | []    |
| [e]  | [e]     | [e,f] |
| []   | []      | []    | 
+------------------------+

Solution

Try it

df1 = df.select('arr_1','arr_2',monotonically_increasing_id().alias('id'),concat('arr_1','arr_2').alias('arr'))
   
df1.select('id',explode('arr')).\
   groupBy('id','col').count().\
   select('id','col','count',rank().over(Window.partitionBy('id').orderBy(desc('count'))).alias('rank')).\
   filter(col('rank')==1).\
   join(df1,'id').\
   select(col('col').alias('arr'), 'arr_1', 'arr_2').show()

+---+---------+------+
|arr|    arr_1| arr_2|
+---+---------+------+
|  a|[a, a, b]|   [a]|
|   |       []|    []|
|  e|      [e]|[e, f]|
|  d|[c, d, d]|    []|
+---+---------+------+