Search code examples
javaapache-sparkelasticsearchhadoophive

Connection and read data from elasticsearch to hive


I want to connect hive to elasticsearch. I followed the instruction from here. I do the following steps

1. start-dfs.sh
2. start-yarn.sh
3. launch elasticsearch
4. launch kibana
5. launch hive
inside hive 
a- create a database
b- create a table
c- load data into the table (LOAD DATA LOCAL INPATH '/home/myuser/Documents/datacsv/myfile.csv' OVERWRITE INTO TABLE students; )
d- add jar /home/myuser/elasticsearch-hadoop-7.10.1/dist/elasticsearch-hadoop-hive-7.10.1.jar
e- create a table for Elastic. 
create table students_es (stt int not null, mahocvien varchar(10), tenho string, ten string, namsinh date, gioitinh string, noisinh string, namvaodang date, trinhdochuyenmon string, hesoluong float, phucaptrachnhiem float, chucvudct string, chucdqh string, dienuutien int, ghichu int) STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler' TBLPROPERTIES('es.nodes' = '127.0.0.1', 'es.port' = '9201', 'es.resource' = 'students/student');

f- insert overwrite table students_es select * from students;  

Then the error I got is the following

FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask. org/apache/commons/httpclient/protocol/ProtocolSocketFactory

I used the components kibana: 7.10.1 hive : 3.1.2 hadoop: 3.1.2


Solution

  • I finally found how to solve it. You need to download the jar file commons-httpclient-3.1.jar and put it into your hive lib directory.