Search code examples
sqlhadoophiveclouderaimpala

(Hive, SQL) - How to sort a list of string inside a column?


I have a big data problem in Hive (SQL).

SELECT genre, COUNT(*) AS unique_count
FROM table_name
GROUP BY genre

which gives result like:

genre           |   unique_count
----------------------------------
Romance,Crime,Drama,Law | 1560
Crime,Drama,Law,Romance | 895
Law,Romance,Crime,Drama | 942
Adventure,Action        | 3250
Action,Adventure        | 910

What I want is to sort the elements in genre ASC|DESC and get results like

genre           |   unique_count
----------------------------------
Crime,Drama,Law,Romance | 3397
Action,Adventure        | 4160

I could do this in Python but I have over 200 Million rows of data. I'm not aware of any reasonable way I can move that data. So how can I achieve this?


Solution

  • select      concat_ws(',',sort_array(split(genre,','))) as genre
               ,count(*)                                    as unique_count
    
    from        table_name
    
    group by    concat_ws(',',sort_array(split(genre,',')))