Search code examples
sqlhivehiveqlwindow-functionsaggregation

Failed to breakup Windowing invocations into Groups. At least 1 group must only depend on input columns


I have a dataset with booking hotels. date_in has format "yyyy-MM-dd". I need select top 10 the most visited hotel by month.

SELECT top_visits.date_ci, top_visits.hotel_id, top_visits.count_visits
FROM (
   SELECT  date_ci, hotel_id, COUNT(id) AS count_visits, 
   RANK() OVER (
   PARTITION BY date_ci, hotel_id ORDER BY COUNT(id) DESC) as rank
   FROM ( 
      SELECT id, hotel_id, SUBSTRING(my_tab.date_in, 1, 7) as date_ci 
      FROM my_database.my_tab) x
   ) top_visits
GROUP BY date_ci, hotel_id HAVING rank <= 10;

I get the following error:

Error: Error while compiling statement: FAILED: SemanticException Failed to breakup Windowing invocations into Groups. At least 1 group must only depend on input columns. Also check for circular dependencies. Underlying error: org.apache.hadoop.hive.ql.parse.SemanticException: Line 4:13 Expression not in GROUP BY key 'hotel_id'


Solution

  • Move COUNT(id) aggregation into the subquery, add group by:

    SELECT top_visits.date_ci, top_visits.hotel_id, top_visits.count_visits
    FROM (
          SELECT date_ci, hotel_id, count_visits, 
                 RANK() OVER (PARTITION BY date_ci, hotel_id ORDER BY count_visits DESC) as rank
            FROM ( 
                  SELECT hotel_id, SUBSTRING(my_tab.date_in, 1, 7) as date_ci,
                         COUNT(id) AS count_visits 
                    FROM my_database.my_tab
                  GROUP BY hotel_id, SUBSTRING(my_tab.date_in, 1, 7)
                 ) x
         ) top_visits
    WHERE rank <= 10;