I am new to hadoop ecosystem. I was trying to create hive table from CSV file using the below query.
CREATE EXTERNAL TABLE IF NOT EXISTS proxy_data(
date_time TIMESTAMP,time_taken INT, c_ip STRING,
sc_status INT, s_action STRING, sc_bytes INT,
cs_bytes INT, cs_method STRING, cs_uri STRING,
cs_host STRING, uri_port INT, uri_path STRING,
uri_query STRING, username STRING, auth STRING,
supplier_name STRING, content_type STRING, referer STRING,
user_agent STRING, filter_result STRING, categories STRING,
x_virus_id STRING, proxy_ip STRING
)
COMMENT 'Proxy logs'
LOCATION '/user/admin'
tblproperties ("skip.header.line.count"="1");
This query actually created a table proxy_data and populated the values present in the csv files which are located in the prescribed location.
Now, I want to append values from another set of CSVs to the same table(it should skip the headings present in the csv file). I checked for various solutions, but nothing is meeting my need.
You may follow this approach:
skip.header.line.count
clause in this table).Then, load append staging table's data into the main table.
create table <my_table_stg>(col1 data_type1, col2, data_type2...)
row format delimited fields terminated by ','
tblproperties ("skip.header.line.count"="1");
create table <my_table>(col1 data_type1, col2, data_type2...);
load data inpath '/file/location/my_file.csv' overwrite into table <my_table_stg>;
insert into table <my_table> select * from <my_table_stg>;
P.S: Your table syntax doesn't haverow format delimited
clause. Please make sure you add it as shown above