Search code examples
arrayshadoophiveudf

Are there any Hive array comparison functions/udf's


Does hive have any array comparison functions/udf's to see if array=array?

For example:

Select a.xyz, b.abc from a left join b on a.C=b.D

Where C and D are arrays


Solution

  • Arrays are ordered structures,

     concat_ws( <separator> , <array> ) 
    

    this function will join all the array elements into a string using the separator.

    create table arrayDemo( id bigint, list array<String> );
    create table dummy( a int );
    insert into table dummy values ( 1 ) ; 
    insert into arraydemo select 1, array("Paperino", "Topolino") from dummy;
    insert into arraydemo select 2, array("Pippo", "Pluto") from dummy;
    insert into arraydemo select 1, array("Pippo", "Pluto") from dummy; 
    
    select * from arraydemo; 
    +---------------+--------------------------+--+
    | arraydemo.id  |      arraydemo.list      |
    +---------------+--------------------------+--+
    | 1             | ["Paperino","Topolino"]  |
    | 2             | ["Pippo","Pluto"]        |
    | 1             | ["Pippo","Pluto"]        |
    +---------------+--------------------------+--+
    
    select * 
         from arraydemo as a1 
              inner join arraydemo as a2 
              on concat_ws( "|", a1.list ) = concat_ws("|", a2.list ); 
    
    +--------+--------------------------+--------+--------------------------+--+
    | a1.id  |         a1.list          | a2.id  |         a2.list          |
    +--------+--------------------------+--------+--------------------------+--+
    | 1      | ["Paperino","Topolino"]  | 1      | ["Paperino","Topolino"]  |
    | 2      | ["Pippo","Pluto"]        | 2      | ["Pippo","Pluto"]        |
    | 1      | ["Pippo","Pluto"]        | 2      | ["Pippo","Pluto"]        |
    | 2      | ["Pippo","Pluto"]        | 1      | ["Pippo","Pluto"]        |
    | 1      | ["Pippo","Pluto"]        | 1      | ["Pippo","Pluto"]        |
    +--------+--------------------------+--------+--------------------------+--+
    

    I'm using it in this example in a "toString()" fashion. Sometimes the best is to compare only the important pieces of the arrays

     select * 
          from arraydemo as a1 
          inner join arraydemo as a2 
          on a1.list[0] = a2.list[0]; 
    

    Hope this helps.