Search code examples
hdfsimpala

How to load data from CSV into an external table in impala


I am following this solution for loading an external table into Impala as I get the same error if I load data by referring to the file.

So, If I run:

[quickstart.cloudera:21000] > create external table Police2 (Priority string,Call_Type string,Jurisdiction string,Dispatch_Area string,Received_Date string,Received_Time int,Dispatch_Time int,Arrival_Time int,Cleared_Time int,Disposition string) row format delimited
                            > fields terminated by ',' 
                            > STORED as TEXTFILE
                            > location '/user/cloudera/rdpdata/rpd_data_all.csv' ;

I get:

Query: create external table Police2 (Priority string,Call_Type string,Jurisdiction string,Dispatch_Area string,Received_Date string,Received_Time int,Dispatch_Time int,Arrival_Time int,Cleared_Time int,Disposition string) row format delimited
fields terminated by ','
STORED as TEXTFILE
location '/user/cloudera/rdpdata/rpd_data_all.csv'
ERROR: ImpalaRuntimeException: Error making 'createTable' RPC to Hive Metastore: 
CAUSED BY: MetaException: hdfs://quickstart.cloudera:8020/user/cloudera/rdpdata/rpd_data_all.csv is not a directory or unable to create one

and If run the below, nothing get imported.

[quickstart.cloudera:21000] > create external table Police2 (Priority string,Call_Type string,Jurisdiction string,Dispatch_Area string,Received_Date string,Received_Time int,Dispatch_Time int,Arrival_Time int,Cleared_Time int,Disposition string) row format delimited
                            >  fields terminated by ',' 
                            > location '/user/cloudera/rdpdata' ;
Query: create external table Police2 (Priority string,Call_Type string,Jurisdiction string,Dispatch_Area string,Received_Date string,Received_Time int,Dispatch_Time int,Arrival_Time int,Cleared_Time int,Disposition string) row format delimited
 fields terminated by ','
location '/user/cloudera/rdpdata'
Fetched 0 row(s) in 1.01s

and the content of the folder

[cloudera@quickstart ~]$ hadoop fs -ls /user/cloudera/rdpdata
Found 1 items
-rwxrwxrwx   1 cloudera cloudera   75115191 2020-09-02 19:36 /user/cloudera/rdpdata/rpd_data_all.csv

and the content of the file:

[cloudera@quickstart ~]$ hadoop fs -cat  /user/cloudera/rdpdata/rpd_data_all.csv
1,EMSP,RP,RC, 03/21/2013,095454,000000,000000,101659,CANC

and the screenshot of the cloudera quickstart vm enter image description here


Solution

  • The location option in the impala create table statement determines the hdfs_path or the HDFS directory where the data files are stored. Try giving the directory location instead of the file name that should let you use the existing data.

    For your reference : https://impala.apache.org/docs/build/html/topics/impala_tables.html