Search code examples
hivederby

Unable to initialize hive with Derby from Brew install


It had been my understanding that Derby creates file(s) in the current directory. But there are none there.

So I had tried to do the hive initialization using Derby: but .. it seems there is a derby database already.

 schematool --verbose -initSchema -dbType derby


Starting metastore schema initialization to 2.1.0
Initialization script hive-schema-2.1.0.derby.sql
Connecting to jdbc:derby:;databaseName=metastore_db;create=true
Connected to: Apache Derby (version 10.10.2.0 - (1582446))
Driver: Apache Derby Embedded JDBC Driver (version 10.10.2.0 - (1582446))
Transaction isolation: TRANSACTION_READ_COMMITTED
0: jdbc:derby:> !autocommit on
Autocommit status: true
0: jdbc:derby:> CREATE FUNCTION "APP"."NUCLEUS_ASCII" (C CHAR(1)) RETURNS INTEGER LANGUAGE JAVA PARAMETER STYLE JAVA READS SQL DATA CALLED ON NULL INPUT EXTERNAL NAME 'org.datanucleus.store.rdbms.adapter.DerbySQLFunction.ascii'
Error: FUNCTION 'NUCLEUS_ASCII' already exists. (state=X0Y68,code=30000)

Closing: 0: jdbc:derby:;databaseName=metastore_db;create=true
org.apache.hadoop.hive.metastore.HiveMetaException: Schema initialization FAILED! Metastore state would be inconsistent !!
Underlying cause: java.io.IOException : Schema script failed, errorcode 2
org.apache.hadoop.hive.metastore.HiveMetaException: Schema initialization FAILED! Metastore state would be inconsistent !!
    at org.apache.hive.beeline.HiveSchemaTool.doInit(HiveSchemaTool.java:291)
    at org.apache.hive.beeline.HiveSchemaTool.doInit(HiveSchemaTool.java:264)
    at org.apache.hive.beeline.HiveSchemaTool.main(HiveSchemaTool.java:505)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.io.IOException: Schema script failed, errorcode 2
    at org.apache.hive.beeline.HiveSchemaTool.runBeeLine(HiveSchemaTool.java:390)
    at org.apache.hive.beeline.HiveSchemaTool.runBeeLine(HiveSchemaTool.java:347)
    at org.apache.hive.beeline.HiveSchemaTool.doInit(HiveSchemaTool.java:287)

So .. where is it?

Update I have reinstalled hive from scratch using

  brew reinstall hive

And the same error occurs.

Another update Given the new direction of this error it now is answered by within another question:

An answer to a non-os/x - but similar otherwise - question was found that can serve here:

https://stackoverflow.com/a/40017753/1056563

I installed hive with HomeBrew(MacOS) at /usr/local/Cellar/hive and afer running schematool -dbType derby -initSchema I get the following error message:

Starting metastore schema initialization to 2.0.0 Initialization script hive-schema-2.0.0.derby.sql Error: FUNCTION 'NUCLEUS_ASCII' already exists. (state=X0Y68,code=30000) org.apache.hadoop.hive.metastore.HiveMetaException: Schema initialization FAILED! Metastore state would be inconsistent !!

However, I can't find either metastore_db or metastore_db.tmp folder under install path, so I tried:

find /usr/ -name hive-schema-2.0.0.derby.sql
vi /usr/local/Cellar/hive/2.0.1/libexec/scripts/metastore/upgrade/derby/hive-schema-2.0.0.derby.sql
comment the 'NUCLEUS_ASCII' function and 'NUCLEUS_MATCHES' function
rerun schematool -dbType derby -initSchema, then everything goes well!

Solution

  • Homebrew installs Hive (version 2.3.1) unconfigured. The default settings are to use in-process Derby database (Hive already includes the required lib).

    The only thing you have to do (immediatelly after brew install hive) is to initialize the database:

    schematool -initSchema -dbType derby
    

    and then you can run hive, and it will work. However, if you tried to run hive before initializing the database, Hive will actually semi-create an incomplete database and will fail to work:

    show tables;
    FAILED: SemanticException org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
    

    Since the database is semi-created, schematool will now fail as well:

    Error: FUNCTION 'NUCLEUS_ASCII' already exists. (state=X0Y68,code=30000)
    org.apache.hadoop.hive.metastore.HiveMetaException: Schema initialization FAILED! Metastore state would be inconsistent !!
    

    To fix that, you will have to delete the database:

    rm -Rf metastore_db
    

    and run the initilization command again.

    Noticed that I deleted the metastore_db from current directory? This is another problem: Hive is configured to create and use the Derby database in current working dir. This is because it has the following default value for ‘javax.jdo.option.ConnectionURL’:

    jdbc:derby:;databaseName=metastore_db;create=true
    

    To fix that, create file /usr/local/opt/hive/libexec/conf/hive-site.xml as

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    
    <configuration>
      <property>
        <name>javax.jdo.option.ConnectionURL</name>
        <value>jdbc:derby:/usr/local/var/hive/metastore_db;create=true</value>
      </property>
    </configuration>
    

    and recreate the database like before. Now the database is in /usr/local/var/hive, so in case you again accidentally ran hive before initializing the DB, delete it with:

    rm -Rf /usr/local/var/hive