Search code examples
hadoophive

How to use hive on hdfs?


I have set up Apache Hive environment.

And, I have created a databases named cx like this:

hive>create database cx;
OK
Time taken: 0.32 seconds
hive (default)> show databases;
OK
cx
default
Time taken: 0.032 seconds, Fetched: 2 row(s)
hive (default)>

And when I am using "DESCRIBE DATABASE" command to check the details of database cx. I found it's store is present on local filesystem :

hive> describe database cx;
OK
cx      file:/user/hive/warehouse/cx.db root    USER
Time taken: 0.038 seconds, Fetched: 1 row(s)

My question is, how to store this database on hdfs?

This is my hive-site.xml settings:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
  <property>
    <name>hive.metastore.warehouse.dir</name>
    <value>/user/hive/warehouse</value>
    <description>location of default database for the warehouse</description>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:derby:;databaseName=/user/hive/warehouse/metastore_db;create=true</value>
    <description>JDBC connect string for a JDBC metastore</description>
  </property>
</configuration>

This is my .hiverc file:

set hive.cli.print.current.db=true;
set hive.exec.mode.local.auto=true;

Solution

  • Hive is Meta Service layer on top of Hadoop i.e., HDFS/HBase.

    Hive doesn't store the actual data, the actual data is stored in HDFS or NoSQL stores like HBase/Cassandra.

    Hive is a table management/relational view to HDFS data. So actual data sits in HDFS and metadata i.e., database name, table name, view name etc.. are stored in Hive using Hive Metastore.

    Hive databases are directories in HDFS with .db extension. The location of all the database directories is warehouse location in HDFS i.e., /user/hive/warehouse (hive.metastore.warehouse.dir).

    So we create database using Hive, Hive internally creates a directory in HDFS and maps this directory to database name in Hive metadata.