Search code examples
hiveimpala

Impala query cannot retrieve result with NullPointerException


I have the following query that I run over hive/impala:

select count(p.id) as tweet_count, p.author as author,p.profile_image_url as profile_image_url,p.screen_name as screen_name,
concat_ws('/',min(p.postday),min(p.postmonth),min(p.postyear) ) as creation_date,p.message message,af.followerid as follower 
from post p 
inner join author_follower af on af.id like if(p.author= null, '', concat(p.author,'%'))
where p.hashtaglist like 'hashtagtobeused' 
group by author,profile_image_url,screen_name,message,follower
ORDER BY cast(min(postyear) as int),cast(min(postmonth) as int),cast(min(postday) as int),cast(min(posthour) as int) ASC;

but for some reason I get the following error result

Your query has the following error(s):

Bad status for request 3304: TGetOperationStatusResp(status=TStatus(errorCode=None, errorMessage=None, sqlState=None, infoMessages=None, statusCode=0), operationState=5, errorMessage=None, sqlState=None, errorCode=None)

I checked the query and I cannot find a problem with it, can anyone please help and guide to where the problem is?why do I have this error instead of result set


Solution

  • Consider carefully reformatting the query, as in some cases, Impala crashes with SEGV when the SQL parse itself fails due to simple issues like spaces. If you're running Cloudera, you'll find logs in /run/cloudera-scm-agent/process on the node that ran the query.

    We have resolved these issues by being careful about SQL formatting (which is also just good practice as it make query errors easier to spot), e.g.

    SELECT
        COUNT(p.id)                                                     AS tweet_count,
        p.author                                                        AS author,
        p.profile_image_url                                             AS profile_image_url,
        p.screen_name                                                   AS screen_name,
        concat_ws('/', MIN(p.postday), MIN(p.postmonth), MIN(p.postyear) ) AS creation_date,
        p.message                                                       AS MESSAGE,
        af.followerid                                                   AS follower
    FROM
        post p
    INNER JOIN
        author_follower af
    ON
        af.id LIKE IF(p.author = NULL, '', concat(p.author, '%'))
    WHERE
        p.hashtaglist LIKE 'hashtagtobeused'
    GROUP BY
        author,
        profile_image_url,
        screen_name,
        MESSAGE,
        follower
    ORDER BY
        CAST(MIN(postyear) AS INT),
        CAST(MIN(postmonth) AS INT),
        CAST(MIN(postday) AS INT),
        CAST(MIN(posthour) AS INT) ASC;
    

    (By the way, I used dbVisualizer to validate and reformat the query syntax -- great tool to consider)