I have created the hive external table using below command:
use hive2;
create external table depTable (depId int comment 'This is the unique id for each dep', depName string,location string) comment 'department table' row format delimited fields terminated by ","
stored as textfile location '/dataDir/';
Now, when I view the HDFS I can see the db but there is no depTable
inside the warehouse.
[cloudera@quickstart ~]$ hadoop fs -ls /user/hive/warehouse/hive2.db
[cloudera@quickstart ~]$
Above you can see that there is no table created in this DB. As far as I know, external tables are not stored in the hive warehouse.So am I correct ?? If yes then where is it stored ??
But if I create external table first and then load the data then I am able to see the file inside hive2.db
.
hive> create external table depTable (depId int comment 'This is the unique id for each dep', depName string,location string) comment 'department table' row format delimited fields terminated by "," stored as textfile;
OK
Time taken: 0.056 seconds
hive> load data inpath '/dataDir/department_data.txt' into table depTable;
Loading data to table default.deptable
Table default.deptable stats: [numFiles=1, totalSize=90]
OK
Time taken: 0.28 seconds
hive> select * from deptable;
OK
1001 FINANCE SYDNEY
2001 AUDIT MELBOURNE
3001 MARKETING PERTH
4001 PRODUCTION BRISBANE
Now, if I fire the hadoop fs
query I can see this table under database as below:
[cloudera@quickstart ~]$ hadoop fs -ls /user/hive/warehouse/hive2.db
Found 1 items
drwxrwxrwx - cloudera supergroup 0 2019-01-17 09:07 /user/hive/warehouse/hive2.db/deptable
If I delete the table still I am able to see table in the HDFS as below:
[cloudera@quickstart ~]$ hadoop fs -ls /user/hive/warehouse/hive2.db
Found 1 items
drwxrwxrwx - cloudera supergroup 0 2019-01-17 09:11 /user/hive/warehouse/hive2.db/deptable
So, what is the exact behavior of the external tables ?? When I create using LOCATION
keyword where does it get stored and when I create using load
statement why it is getting stored in the HDFS
and after deleting why it doesn't get deleted.
The main difference between EXTERNAL
and MANAGED
tables is in Drop table/partition behavior.
When you drop MANAGED
table/partition, the location with data files also removed.
When you drop EXTERNAL
table, the location with data files remains as is.
UPDATE: TBLPROPERTIES ("external.table.purge"="true")
in release 4.0.0+ (HIVE-19981) when set on external table would delete the data as well.
EXTERNAL
table as well as MANAGED
is being stored in the location specified in DDL. You can create table on top of existing location with data files already in the location and it will work for both EXTERNAL
or MANAGED
, does not matter.
You even can create both EXTERNAL
and MANAGED
tables on top of the same location, see this answer with more details and tests: https://stackoverflow.com/a/54038932/2700344
If you specified location, the data will be stored in that location for both types of tables. If you did not specify location, the data will be in default location: /user/hive/warehouse/database_name.db/table_name
for both managed and external tables.
Update: Also there can be some restrictions on location depending on platform/vendor, see https://stackoverflow.com/a/67073849/2700344, you may not be allowed to create manged/external tables outside their default allowed root location.
See also official Hive docs on Managed vs External Tables