Search code examples
mysqlhadoophivehive-metastoretrino

Unable to start Hive and Catalog 'hive' does not exist in Trino


I installed Apache Hive 3, Apache Hadoop 3, MySQL and Trino to query data. I started Hive metastore, I have MySQL is running. But when I run a simple query in Trino:

trino> show tables from default; ==> failed: line 1:1: Catalog 'hive' does not exist

When I try to launch Hive CLI. I got this exception:

Hive Session ID = dd740516-a5d0-4f8d-ae24-065e2cfe889c 
Exception in thread "main" java.lang.ClassCastException: jdk.internal.loader.ClassLoaders$AppClassLoader incompatible with java.net.URLClassLoader 
 at org.apache.hadoop.hive.ql.session.SessionState.<init>(SessionState.java:413) 
 at org.apache.hadoop.hive.ql.session.SessionState.<init>(SessionState.java:389) 
 at org.apache.hadoop.hive.cli.CliSessionState.<init>(CliSessionState.java:60) 
 at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:705) 
 at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:683) 
 at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
 at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
 at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
 at java.base/java.lang.reflect.Method.invoke(Method.java:566) 
 at org.apache.hadoop.util.RunJar.run(RunJar.java:318) 
 at org.apache.hadoop.util.RunJar.main(RunJar.java:232) 

I tried adding these properties to hive-site.xml but I cannot launch Hive CLI.

<property> 
 <name>system:java.io.tmpdir</name> 
 <value>/tmp/hive</value> 
 </property> 
 <property> 
 <name>system:user.name</name> 
 <value>${user.name}</value> 
 </property> 

Someone can help please ? Thanks a lot


Solution

  • I can answer the first question about the Trino CLI.

    Before you can run a query in Trino on your data in HDFS, you will need to configure the hive connector catalog first. In your Trino installation, there should be an etc directory. Beneath that directory is the etc/catalog directory.

    Make a new file etc/catalog/hive.properties and add the following configuration.

    connector.name=hive-hadoop2
    hive.metastore.uri=thrift://<your-metastore-ip-address>:9083
    

    Let's break down what these properties mean:

    1. connector.name=hive-hadoop2 indicates that the catalog will use the Trino hive connector.
    2. hive.metastore.uri=thrift://<your-metastore-ip-address>:9083 tells Trino where to find the metastore that is installed with Hive.

    If you're not sure where to find your metastore ip address, the hive documentation indicates some configuration files that contain them depending on which version of Hadoop/Hive you are running.

    Hive and Trino share the metastore, but run the queries on entirely different resources. I wrote this blog to help introduce these concepts when folks are starting with Trino. Maybe it can help as you start.

    Assuming there's nothing too complex about your setup, that should be all that is required. In some cases you may need the hive.config.resources to contain the path of your hdfs-site.xml and core-site.xml.