Please consider the following setup.
hadoop version 2.6.4
spark version 2.1.0
OS CentOS Linux release 7.2.1511 (Core)
All software is installed on a single machine as a single node cluster, spark is installed in standalone mode.
I am trying to use Spark Thrift Server.
To start the spark thrift server I run the shell script
start-thriftserver.sh
After running the thrift server, I can run beeline command line tool and issue the following commands: The commands run successfully:
!connect jdbc:hive2://localhost:10000 user_name '' org.apache.hive.jdbc.HiveDriver
create database testdb;
use testdb;
create table names_tab(a int, name string) row format delimited fields terminated by ' ';
My first question is where on haddop is the underlying file/folder for this table/database created? The problem is even if hadoop is stopped using stop-all.sh, still the create table/database command is successful, which makes me think that the table is not created on hadoop at all.
My second question is how do I tell spark where in the world is hadoop installed? and ask spark to use hadoop as the underlying data store for all queries run from beeline.
Am I supposed to install spark in some other mode?
Thanks in advance.
My objective was to get the beeline command line utility work through Spark Thrift Server using hadoop as underlying data-store and I got it to work. My setup was like this:
Hadoop <--> Spark <--> SparkThriftServer <--> beeline
I wanted to configure spark in such a manner that it uses hadoop for all queries run at beeline command line utility. The trick was to specify the following property in spark-defaults.xml.
spark.sql.warehouse.dir hdfs://localhost:9000/user/hive/warehouse
By default spark uses derby for both meta data and the data itself (called warehouse in spark) In order to have spark use hadoop as warehouse I had to add this property.
Here is a sample output
./beeline
Beeline version 1.0.1 by Apache Hive
beeline> !connect jdbc:hive2://localhost:10000 abbasbutt '' org.apache.hive.jdbc.HiveDriver
Connecting to jdbc:hive2://localhost:10000
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/abbasbutt/Projects/hadoop_fdw/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/abbasbutt/Projects/hadoop_fdw/apache-hive-1.0.1-bin/lib/hive-jdbc-1.0.1-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Connected to: Spark SQL (version 2.1.0)
Driver: Hive JDBC (version 1.0.1)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://localhost:10000>
0: jdbc:hive2://localhost:10000>
0: jdbc:hive2://localhost:10000>
0: jdbc:hive2://localhost:10000> create database my_test_db;
+---------+--+
| Result |
+---------+--+
+---------+--+
No rows selected (0.379 seconds)
0: jdbc:hive2://localhost:10000> use my_test_db;
+---------+--+
| Result |
+---------+--+
+---------+--+
No rows selected (0.03 seconds)
0: jdbc:hive2://localhost:10000> create table my_names_tab(a int, b string) row format delimited fields terminated by ' ';
+---------+--+
| Result |
+---------+--+
+---------+--+
No rows selected (0.11 seconds)
0: jdbc:hive2://localhost:10000>
Here are the corresponding files in hadoop
[abbasbutt@localhost test]$ hadoop fs -ls /user/hive/warehouse/
17/01/19 10:48:04 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 4 items
drwxrwxr-x - abbasbutt supergroup 0 2017-01-18 23:45 /user/hive/warehouse/fdw_db.db
drwxrwxr-x - abbasbutt supergroup 0 2017-01-18 23:23 /user/hive/warehouse/my_spark_db.db
drwxrwxr-x - abbasbutt supergroup 0 2017-01-19 10:47 /user/hive/warehouse/my_test_db.db
drwxrwxr-x - abbasbutt supergroup 0 2017-01-18 23:45 /user/hive/warehouse/testdb.db
[abbasbutt@localhost test]$ hadoop fs -ls /user/hive/warehouse/my_test_db.db/
17/01/19 10:50:52 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 1 items
drwxrwxr-x - abbasbutt supergroup 0 2017-01-19 10:50 /user/hive/warehouse/my_test_db.db/my_names_tab
[abbasbutt@localhost test]$