I created a simple four-node Hadoop cluster with CDH 4.7, including Impala 1.1. I'm able to copy CSV files to HDFS and create and query Impala tables over the as described in the tutorial. But I can't query the same table on a different data node:
[example.com:21000] > select * from tab1;
Query: select * from tab1
ERROR: AnalysisException: Table does not exist: default.tab1
I thought perhaps I needed to reissue the CREATE TABLE
statement on the second node, but then it suddenly knows the table's there:
[example.com:21000] > CREATE EXTERNAL TABLE tab1
> (
> id INT,
> col_1 BOOLEAN,
> col_2 DOUBLE,
> col_3 TIMESTAMP
> )
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
> LOCATION '/user/dwheeler/sample_data/tab1';
Query: create EXTERNAL TABLE tab1
(
id INT,
col_1 BOOLEAN,
col_2 DOUBLE,
col_3 TIMESTAMP
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LOCATION '/user/theory/sample_data/tab1'
ERROR: AlreadyExistsException: Table tab1 already exists
So it knows it's there, but I can't query it --- or refresh it:
[example.com:21000] > refresh tab1;
Query: refresh tab1
ERROR: AnalysisException: Table does not exist: default.tab1
Is there some command I need to execute to get all of the impalad
s runnig on data nodes to make a queryable table?
I filed a bug report and got back an answer:
In Impala 1.1 and earlier you need to issue an explicit "invalidate metadata" command to make tables created on other nodes visible to the local Impala daemon.
Starting with Impala 1.2 this won't be necessary; the new catalog service will take care of metadata distribution to all impalad's in the cluster.
So it was INVALIDATE METADATA
that I had failed to notice. Glad to hear it won’t be necessary in 2.0.