So I am having an issue with being able to execute Presto queries via AWS EMR.
I have launched an EMR running hive/presto and using AWS Glue as the metastore.
When I SSH into the master node and run hive I can run "show schemas;" and it shows me the 3 different databases that we have on AWS Glue.
If I then enter the Presto CLI and run "show schemas on hive" I only see two "default" and "information_schema"
For the life of me I cannot figure out why presto is not able to see the same Hive schemas.
It is a basic default cluster launch on EMR using default settings mainly.
Can someone point me in the direction of what I should be looking for? I have checked the hive.properties file and that looks good, I am just at a loss as to why presto is not able to see the same info as hive.
I do have the following configuration set
[{"classification":"hive-site", "properties":{"hive.metastore.client.factory.class":"com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory"}, "configurations":[]}]
AWS docs http://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hive-metastore-glue.html make it seem that this should be plug and play but I am obviously missing something
Starting from Amazon EMR release version 5.10.0 you can. Simply, set the hive.metastore.glue.datacatalog.enabled property to true, as follows:
[
{
"Classification": "presto-connector-hive",
"Properties": {
"hive.metastore.glue.datacatalog.enabled": "true"
}
}
]
Optionally, you can manually set
hive.metastore.glue.datacatalog.enabled=true
in the/etc/presto/conf/catalog/hive.properties
file on the master node. If you use this method, make sure thathive.table-statistics-enabled=false
in the properties file is set because the Data Catalog does not support Hive table and partition statistics. If you change the value on a long-running cluster to switch metastores, you must restart the Presto server on the master node (sudo restart presto-server
).
Sources: AWS Docs