Search code examples
hivehiveddl

Does Hive need an explicit command to load data into the table from HDFS


What is the difference between this:

CREATE TABLE dbo.table1(
 id INT,
 firstname STRING,

) ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS /usr/hive/sample_data.csv
;

and

CREATE TABLE dbo.table1(
 id INT,
 firstname STRING
) ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE
;

then

LOAD DATA INPATH '/usr/hive/sample_data.csv' INTO dbo.table1;

Solution

  • First create sentence is completely wrong one: STORED AS should be TEXTFILE, ORC, Parquet, etc, it is not location and of course when you create table you should not provide file name. Tables in hive are created on locations(folders), not files and the property for tables location is LOCATION, not STORED AS. See recent example: https://stackoverflow.com/a/68095278/2700344

    Second create sentence creates table without location specified (default loacation will be used for managed tables, like this /user/hive/warehouse/dbo/table1 ), see here more details https://stackoverflow.com/a/67073849/2700344 Execute DESC FORMATTED dbo.table1 and check LOCATION.

    Yes, you need load data to be executed because your file is located not in the table location. If you place file into some dedicated location for that table, you can CREATE EXTERNAL TABLE and specify LOCATION. But your file is currently in such folder which should not be used as a table location: /usr/hive. This /usr/hive/table1 looks much better. Alternatvely you can create table like in second CREATE sentence and just copy file into it's location using hadoop fs cp command. LOAD DATA does the same.