Search code examples
amazon-web-servicesaws-lambdahbaseamazon-emr

Accessing HBase on Amazon EMR with Athena


Has anyone managed to access HBase running as a service on Amazon EMR cluster with Athena? I'm trying to establish a connection to the HBase instance, but the lambda (provided with Athena java function) fails with the following error:

Failed to invoke lambda function due to 
com.amazonaws.services.lambda.invoke.LambdaFunctionException: org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=3, 
exceptions: Tue Aug 31 09:42:35 UTC 2021, 
RpcRetryingCaller{globalStartTime=1630402955107, pause=500, retries=3}, 
org.apache.hadoop.hbase.MasterNotRunningException: java.net.UnknownHostException: can 
not resolve ip-10-113-8-29.my.domain.com,16000,1630400215973 Tue Aug 31 09:42:35 
UTC 2021, RpcRetryingCaller{globalStartTime=1630402955107, pause=500, retries=3}

my.domain.com in this case is a part of VPC dhcp options set. Both lambda and EMR cluster belong to the same VPC, so both have same dhcp options. Obviously lambda can't resolve dns name. Can you please help me, how should I inject dns names to lambda function? or is there some other solutions?

PS. HBase on EMR cluster is up and running. And I have another lambda - python function, that successfully puts data into this database, but this lambda uses shared EMR master public DNS url, I can provide one to Athena's lambda function, i.e. ec2-3-131-xx-xxx.us-east-2.compute.amazonaws.com:16000:2181, but I guess the function gathers internal DNS names from zookeeper or some other services on EMR...


Solution

  • Finally, the solution for the issue is to create appropriate dns records for each cluster ec2 instance with the necessary names inside Amazon Route53 service.