Search code examples
mysqlhadoophivehiveqlhue

How can I select certain rows that the key starts with a prefix in Hive?


A very simple question:

I wanted to select all the rows that their keys have a certain prefix in Hive, but somehow it's not working.

The queries I've tried:

select * from solr_json_history where dt='20170814' and hour='2147' and substr(`_root_`,1,9)='P10004232' limit 100;

SELECT * FROM solr_json_history where dt='20170814' and hour='2147' and `_root_` like 'P19746284%' limit 100; 

My Hue editor just hangs there without returning anything.

I've checked this time range there's data in my table by this query:

select * from solr_json_history where dt='20170814' and hour='2147' limit 15;

It's returning 15 records as expected.

Any help please?

Thanks a lot!


Solution

  • Per @musafir-safwan's request, I've added it as an answer here.

    UPDATE: I'm not able to provide sample data. But my problem got resolved.

    Thanks for the commentator's attention.

    My table does have data, no need to worry about that. Thanks for checking though.

    The problem was due to a bad Hue UI design, when I issued the above two queries, it takes too long (longer than the set timeout on the UI) to get a response back, so simply, the UI doesn't reply anything, or gives a timeout reminder. It just hangs there.

    Also, those two queries essentially making two RPC calls, so they timed out. Then I changed to use below query:

    select `_root_`,json, count(*) from solr_json_history where dt='20170814' and hour='2147' and substr(`_root_`,1,9)='P19746284' group by `_root_`,json;
    

    the difference is that I added a count(*) which turns this query into a map-reduce job thing, thus no timeout limit, and then it returns the result that I wanted.

    YMMV.

    Thanks.