Search code examples
subqueryleft-join

IMPALA LEFT join with subquery


Suppose I have the following data:

Table ABC

name group
Zlatan Ibrahimovic Group 1
Cristiano Ronaldo Group 2
Messi Group 1
Eduardo Pereira Rodrigues Group 1

What do I want as a final result (Group numbers below are random):

name group
Zlatan Ibrahimovic 870
Cristiano Ronaldo 544
Messi 870
Eduardo Pereira Rodrigues 870

To achieve this result I wrote a query that was working perfectly in AWS Athena, as below:

select table1.name,
       subQuery1.random_cod_group
from ABC table1
left join(select group, cast(rand(1234) * 100 as integer) as random_cod_group
          from ABC group by group) subQuery1
     on subQuery1.group = table1.group

However - to my surprise - when I ran the same query in "impala", the results presented were different, as in the example below. The same groups were being presented with different numbers (instead of 'keeping' the same random number for the same group)

name group
Zlatan Ibrahimovic 987
Cristiano Ronaldo 478
Messi 874
Eduardo Pereira Rodrigues 75

I really can't understand why... Can anyone help me?

Thank you very much!


Solution

  • Maybe you can try an alternative approach. Possibly a random number is being generated on each call, which implies that the join is not necessarily wrong.

    If it is not mandatory to generate random numbers, I suggest using:

    select table1.name,
           subQuery1.random_cod_group
    from ABC table1
    left join(select group, row_number() over (order by group) as random_cod_group
              from ABC group by group) subQuery1 on subQuery1.group = table1.group