Search code examples
sqlhivehiveqlnormalize

how to normalize a column in hive?


I'm trying to normalize a column in hive, which means I have to make each value in the column divide max value in that column. For example: I have a table X:

A   B   C   D
1  0.1  3  0.2
2  0.2  4  0.5
3  0.3  5  0.2

I want to normalize column C, the requested table Y is:

A   B    C    D
1  0.1  0.6  0.2
2  0.2  0.8  0.5
3  0.3   1   0.2

I do not know how to write SQL to express the max value of the whole column.


Solution

  • max window function

    select a
          ,b
          ,c / max(c) over () as c
          ,d
    
    from x
    

    +---+-----+-----+-----+
    | a |  b  |  c  |  d  |
    +---+-----+-----+-----+
    | 1 | 0.1 | 0.6 | 0.2 |
    | 2 | 0.2 | 0.8 | 0.5 |
    | 3 | 0.3 | 1   | 0.2 |
    +---+-----+-----+-----+