I'm trying to steup a local hive instance, and want to use the local filesystem as both my metastore and my data warehouse. Is it possible to achieve that without using derby?
Following How to use Hive without hadoop, I setup my hive-site.xml as shown:
<configuration>
<property>
<name>hive.metastore.local</name>
<value>true</value>
</property>
<property>
<name>hive.metastore.metadb.dir</name>
<value>file:///var/metastore/metadb/</value>
</property>
<property>
<name>hive.metastore.schema.verification</name>
<value>false</value>
</property>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>file:///var/metastore/metadb/</value>
<description></description>
</property>
<property>
<name>fs.default.name</name>
<value>file:///tmp</value>
</property>
</configuration>
I expect that I should be able to run hive
on my terminal without any problems, however I encounter the following error:
Exception in thread "main" java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Hive metastore database is not initialized. Please use schematool (e.g. ./schematool -initSchema -dbType ...) to create the schema. If needed, don't forget to include the option to auto-create the underlying database in your JDBC connection string (e.g. ?createDatabaseIfNotExist=true for mysql))
I am not using a JDBC metastore (i.e. derby), so why do I still need to use a JDBC connection string (as per the error messsage?). Is it even possible to run a local hive instance without derby?
The Hive metastore process cannot use just a filesystem. It needs a relational database. The "Hive warehouse" is different, where internal, managed Hive tables are stored, and can be any Hadoop compatible fieleystem (such as local disk)
Derby is stored either in memory, or stored persistently on disk, but using Mysql or Postgres will allow for better performance
Note: Hive still requires Hadoop libraries, so "without Hadoop" isn't possible, even if you aren't using YARN or HDFS
Also, property fs.default.name
has been deprecated and replaced by fs.defaultFS
and must be in the core-site.xml, it's not a valid hive-site property
I am not using a JDBC metastore (i.e. derby),
Yes, you are, via Hive default properties
javax.jdo.option.ConnectionURL=jdbc:derby:;databaseName=metastore_db;create=true