Suppose I have the following data:
Table ABC
name | group |
---|---|
Zlatan Ibrahimovic | Group 1 |
Cristiano Ronaldo | Group 2 |
Messi | Group 1 |
Eduardo Pereira Rodrigues | Group 1 |
What do I want as a final result (Group numbers below are random):
name | group |
---|---|
Zlatan Ibrahimovic | 870 |
Cristiano Ronaldo | 544 |
Messi | 870 |
Eduardo Pereira Rodrigues | 870 |
To achieve this result I wrote a query that was working perfectly in AWS Athena, as below:
select table1.name,
subQuery1.random_cod_group
from ABC table1
left join(select group, cast(rand(1234) * 100 as integer) as random_cod_group
from ABC group by group) subQuery1
on subQuery1.group = table1.group
However - to my surprise - when I ran the same query in "impala", the results presented were different, as in the example below. The same groups were being presented with different numbers (instead of 'keeping' the same random number for the same group)
name | group |
---|---|
Zlatan Ibrahimovic | 987 |
Cristiano Ronaldo | 478 |
Messi | 874 |
Eduardo Pereira Rodrigues | 75 |
I really can't understand why... Can anyone help me?
Thank you very much!
Maybe you can try an alternative approach. Possibly a random number is being generated on each call, which implies that the join is not necessarily wrong.
If it is not mandatory to generate random numbers, I suggest using:
select table1.name,
subQuery1.random_cod_group
from ABC table1
left join(select group, row_number() over (order by group) as random_cod_group
from ABC group by group) subQuery1 on subQuery1.group = table1.group