Search code examples
sqlhivehiveql

Hive Query to find out the average rating of movies having rating more than 2


I have a table named movierating consisting of the following fields:

| ColumnNO. | Name | DataType |

| Column1 | id | int |

| Column2 | movieid | int |

| Column3 | rating | int |

| Column4 | time | string |

I have even created the table for the above description using SQL query and loaded it with data successfully. Now I am supposed to write a Hive query to find out the average rating of movies having rating more than 2, rounding off the average to two decimal places and save the output in output.txt. In the Bash terminal, I have typed the Hive command as follows (after receiving help from @SKM):

hive -e "select movieid, round(avg(rating), 2) from movierating group by movieid having avg(rating) > 2;" > output.txt

I have even referred here for a similar situation, but it didn't help me much.

Please have a look at the screenshots after I ran the query:

Screenshot1 Screenshot2

And when I run the command to open the output.txt file as vim output.txt, i just get a blank file with no data. I am unable to understand what is being described in the terminal (shown in the screenshot).

Query for table creation:

create table if not exists movierating (id int, movieid int, rating int, time string);
load data local inpath '/tmp/Movie-rating.txt' overwrite into table movierating;

This is the first step in the challenge which I am attending. Since I am new to hive, I am not much familiar with its working.

Challenge description:

Challenge Description

where Step1 was to create the table with the above mentioned fields.

Please do help me in this regard.

Based on Comments

Output after running the query sent by @SKM: Screenshot4


Solution

  • You can change your query like below:

    hive -e "select movieid, round(avg(rating), 2) from movierating group by movieid having avg(rating) > 2;" > output.txt
    

    Snapshot: enter image description here