Search code examples
javaimpala

Insert data in impala table using java


I have copied mysql table into hdfs using sqoop and then created table with same name in impala using "create external table" command.

Now I have more data to be inserted into impala table using java api of Impala i.e. ImpalaService.jar. Please help me out to insert data using java api in the table.

Thanks.


Solution

  • Using ImpalaService.jar you can send a single insert statement, something like :

    # java -cp ../deps/libthrift-0.9.1.jar:../deps/slf4j.api-1.6.1.jar:./deps/slf4j-simple-.6.1.jar:../jar/ImpalaService.jar:./jar/ImpalaConnectTest.jar org.ImpalaConnectTest.ImpalaConnectTest localhost  21050 "insert into foo values (1,'message 1')"
    Result size = 0
    #
    

    Then, you can check the result using ImpalaService.jar in the following way:

    # java -cp ../deps/libthrift-0.9.1.jar:../deps/slf4j.api-1.6.1.jar:./deps/slf4j-simple-.6.1.jar:../jar/ImpalaService.jar:./jar/ImpalaConnectTest.jar org.ImpalaConnectTest.ImpalaConnectTest localhost  21050 "select * from foo"
    Result size = 1
    TRow(colVals:[<TColumnValue i32Val:TI32Value(value:1)>, <TColumnValue stringVal:TStringValue(value:message 1)>])
    # 
    

    or by using Impala-shell:

    [root@dub-vcd-vms165 ~]# impala-shell
    Starting Impala Shell without Kerberos authentication
    Connected to XXXXXX
    Server version: impalad version cdh5-1.3.0 RELEASE (build 40e1b62cf0b97f666d084d9509bf9639c575068c)
    Welcome to the Impala shell. Press TAB twice to see a list of available commands.
    
    Copyright (c) 2012 Cloudera, Inc. All rights reserved.
    
    (Shell build version: Impala Shell vcdh5-1.3.0 (40e1b62) built on Tue Mar 25 13:46:44 PDT 2014)
    [XXXXXX:21000] > select * from foo;
    Query: select * from foo
    +----+-----------+
    | id | msg       |
    +----+-----------+
    | 1  | message 1 |
    +----+-----------+
    Returned 3 row(s) in 0.62s
    [XXXXXX:21000] >
    

    Note: If you have multiple data to be inserted, then you can modify ImpalaService.jar to a more elaborate solution that executes multiple insert statements by using the arguments to specify a data source that contains all records to be inserted.