Search code examples
sqlimpalahue

HUE/Impala Query will not execute when running


I have been trying to run this query to find matching accounts and every time the query is ran the HUE environment gets to 75% done and then just stays there. I am not sure how to troubleshoot as I have been reading forums and such trying to figure out what is the issue. I can't use COMPUTE STATS as views are not allowed.

SELECT 
    d.eid,
    d.loyaltyProgramId,
    d.playerAccountNumber,
    dcp1.PhoneNumber,
    dcp1.IsPrimary,
    dcp1.IsPreferredContactNumber,
    d.firstName,
    d.LastName,
    d.BirthDate,
    d.Gender,
    d.IsBanned,
    d.bancode,
    dc2_cust.eid,
    dc2_cust.loyaltyprogramid,
    dc2_cust.playerAccountNumber,
    dcp2_cust.PhoneNumber,
    dc2_cust.FirstName,
    dc2_cust.LastName,
    dc2_cust.BirthDate,
    dc2_cust.Gender,
    dc2_cust.IsBanned,
    dc2_cust.BanCode,
    
    
    CONCAT(d.playeraccountnumber, '-', d.LoyaltyProgramId, ',', dc2_cust.playeraccountnumber, '-', dc2_cust.loyaltyprogramid) AS killkey

FROM gmscompliance_ref.ballybi_dcustomer d
JOIN gmscompliance_ref.ballybi_dcustomerphone dcp1 ON d.customerkey = dcp1.customerkey AND d.loyaltyprogramid = dcp1.loyaltyprogramid
JOIN gmscompliance_ref.ballybi_dcustomerphone dcp2_cust ON dcp1.customerkey < dcp2_cust.customerkey and (translate(dcp1.PhoneNumber, '-', ' ') = translate(dcp2_cust.PhoneNumber, '-', ' ') OR dcp1.PhoneNumber = dcp2_cust.PhoneNumber)
JOIN gmscompliance_ref.ballybi_dcustomer dc2_cust ON dcp2_cust.customerkey = dc2_cust.customerkey AND dcp2_cust.loyaltyprogramid = dc2_cust.loyaltyprogramid

WHERE 
    d.PlayerAccountStatus = 'Active'
    AND dc2_cust.PlayerAccountStatus = 'Active'
    AND d.eid <> 0
    AND d.LoyaltyProgramId <> 'GEO'
    AND d.FirstName = dc2_cust.FirstName
    AND d.eid <> dc2_cust.Eid



ORDER BY d.eid;

Solution

  • With impala on hue, the status bar says 75% is the time it took impala to read the file, not 75% of all the steps. Try going to the query plan section (Click to the query id on the right of status bar, eg: fb4a404290538b7d:9c46dee100000000 ) to see which step is slow. With your query, I think you should pay attention to the condition that joining the table "dcp2_cust" can lead to a data explosion.