Search code examples
hdfsimpalasqoopapache-kudu

Loading data from HDFS to Kudu


I'm trying to load data to a Kudu table but getting a strange result.

In the Impala console I created an external table from the four HDFS files imported by Sqoop:

drop table if exists hdfs_datedim;
create external table hdfs_datedim
( ... )
row format
 delimited fields terminated by ','
location '/user/me/DATEDIM';

A SELECT COUNT(*) tells me there lots of rows present. The data looks good when queried.

I use a standard select into to copy the results

INSERT INTO impala_kudu.DATEDIM
SELECT * FROM hdfs_datedim;

A SELECT COUNT(*) tells me impala_kudu.DATEDIM has four rows (the number of files in HDFS not the number of rows in the table.

Any Ideas?


Solution

  • The data created by sqoop was under the covers was a sequence of poorly formatted csv files. The import failed without an error because of data in the flat file. Watch out for date formats and text strings with delimiters embedded in the string.