Search code examples
hadoophivehortonworks-data-platformapache-tez

Hive on Tez Pushdown Predicate doesn't work in view using window function on partitioned table


Using Hive on Tez running this query against this view causes a full table scan even though there is a Partition on regionid and id. This query in Cloudera Impala takes 0.6s to complete and using Hortonworks Data Platform and Hive on Tez it takes 800s. I've come to the conclusion that in Hive on Tez using a window function prevents the predicate to be pushed down to the inner select causing the full table scan.

CREATE VIEW latestposition AS
WITH t1 AS (
  SELECT *, ROW_NUMBER() OVER ( PARTITION BY regionid, id, deviceid order by ts desc) AS rownos FROM positions 
)
SELECT *
FROM t1
WHERE rownos = 1; 

SELECT * FROM latestposition WHERE  regionid='1d6a0be1-6366-4692-9597-ebd5cd0f01d1' and id=1422792010 and deviceid='6c5d1a30-2331-448b-a726-a380d6b3a432';

I've tried joining this table to itself using the MAX function to get the latest record, it works, and finishes in a few seconds but it still is too slow for my use case. Also if I remove the window function the predicate gets pushed down and this will return in milliseconds.

If anyone has any ideas it would be much appreciated.


Solution

  • For anyone that is interested, I posted this question on the Hortonworks Community forum. The good guys over there raised a bug for this issue on the Hive Jira and are actively working on it.

    https://community.hortonworks.com/questions/8880/hive-on-tez-pushdown-predicate-doesnt-work-in-part.html

    https://issues.apache.org/jira/browse/HIVE-12808