Search code examples
stringhadoophivehiveqlhortonworks-data-platform

Aggregating strings with hortonworks hadoop hive


I am trying to flatten a security table to make a single row per country. I am using Hive as the execution engine currently in hortonworks if this makes a difference to the SQL required.

An example of what im trying to achieve is below.

(country, Name)
(US, 'Matt'),
(US, 'Rocks'),
(GB, 'Stylus'),
(FR, 'Foo'),
(FR, 'Bar'),
(FR, 'Baz')

The query result I am after :

Country Name
----------- ------------------------------
GB           Stylus
FR           Bar, Baz, Foo
US           Matt, Rocks

I am really lost as to where to start. Does anyone know? I found some syntax on another thread that work fine in MSSQL but not here in hortonworks.

Optimal way to concatenate/aggregate strings

Many thanks


Solution

  • I'm not really familiar with Hadoop, but following Column to comma separated value in Hive you can make this:

    SELECT country, collect_set(Name)
    FROM your_table
    GROUP BY country;