Search code examples
amazon-web-serviceshiveemrpresto

AWS EMR Presto not finding correct Hive schemas using AWS Glue


So I am having an issue with being able to execute Presto queries via AWS EMR.

I have launched an EMR running hive/presto and using AWS Glue as the metastore.

When I SSH into the master node and run hive I can run "show schemas;" and it shows me the 3 different databases that we have on AWS Glue.

If I then enter the Presto CLI and run "show schemas on hive" I only see two "default" and "information_schema"

For the life of me I cannot figure out why presto is not able to see the same Hive schemas.

It is a basic default cluster launch on EMR using default settings mainly.

Can someone point me in the direction of what I should be looking for? I have checked the hive.properties file and that looks good, I am just at a loss as to why presto is not able to see the same info as hive.

I do have the following configuration set

[{"classification":"hive-site", "properties":{"hive.metastore.client.factory.class":"com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory"}, "configurations":[]}]

AWS docs http://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hive-metastore-glue.html make it seem that this should be plug and play but I am obviously missing something


Solution

  • Starting from Amazon EMR release version 5.10.0 you can. Simply, set the hive.metastore.glue.datacatalog.enabled property to true, as follows:

    [
      {
        "Classification": "presto-connector-hive",
        "Properties": {
          "hive.metastore.glue.datacatalog.enabled": "true"
        }
      }
    ]
    

    Optionally, you can manually set hive.metastore.glue.datacatalog.enabled=true in the /etc/presto/conf/catalog/hive.properties file on the master node. If you use this method, make sure that hive.table-statistics-enabled=false in the properties file is set because the Data Catalog does not support Hive table and partition statistics. If you change the value on a long-running cluster to switch metastores, you must restart the Presto server on the master node (sudo restart presto-server).

    Sources: AWS Docs