Search code examples
sqlhadoophiveinsertsql-insert

Insert data into bucketed Hive table


Advice on creating/inserting data into Hive's bucketed tables.

Did some reading (https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL+BucketedTables) and tested few options but with no success.

Currently I get following error while running insert:

Error while processing statement: FAILED: Execution Error, return code 3 from org.apache.hadoop.hive.ql.exec.spark.SparkTask

Create code:

CREATE test_in (
id VARCHAR(250), 
field_1 VARCHAR(250), 
field_2 VARCHAR(250), 
field_3 VARCHAR(250),
field_4 VARCHAR(250), 
field_5 VARCHAR(250)
)
PARTITIONED BY(ds STRING)
CLUSTERED BY(id) into 10 buckets
STORED AS orc
tblproperties("orc.compress"="NONE","transactional"="true");

Insert code:

INSERT INTO TABLE test_in
VALUES (
'9gD0xQxOYS',
'ZhQbTjUGLhz8KuQ',
'SmszyJHEqIVAeK8gAFVx',
'RvbRdU7ia1AMHhaXd9tOgLEzi',
'a010E000004uJt8QAE',
'yh6phK4ZG7W4JaOdoOhDJXNJgmcoZU'
)

Need help in creating proper syntax for create/insert statement and some explanation on bucketting in Hive.


Solution

    1. CREATE STATEMENT - The word table is missing. (May be a typo)
    2. INSERT STATEMENT - Partition details are missing. Partition value is required during INSERT operation since it is a partitioned table.

    The correct and working queries are below,

    CREATE STATEMENT:

    CREATE TABLE test_in (
    id VARCHAR(250), 
    field_1 VARCHAR(250), 
    field_2 VARCHAR(250), 
    field_3 VARCHAR(250),
    field_4 VARCHAR(250), 
    field_5 VARCHAR(250)
    )
    PARTITIONED BY(ds STRING)
    CLUSTERED BY(id) into 10 buckets
    STORED AS orc
    

    INSERT STATEMENT:

    INSERT INTO test_in
    PARTITION (ds='123')
    VALUES (
    '9gD0xQxOYS',
    'ZhQbTjUGLhz8KuQ',
    'SmszyJHEqIVAeK8gAFVx',
    'RvbRdU7ia1AMHhaXd9tOgLEzi',
    'a010E000004uJt8QAE',
    'yh6phK4ZG7W4JaOdoOhDJXNJgmcoZU'
    )
    

    Hope this helps!