Below is the query which I am trying in Cloudera quick start VM
sqoop import --username training --password training --connect jdbc:mysql://localhost/loudacre --target-dir /sample --split-by accounts.acct_num --query 'select accounts.first_name FROM accounts JOIN accountdevice ON (accounts.acct_num = accountdevice.account_id) WHERE $CONDITIONS' kquote
Below are the table structure of the two tables used in query
mysql> describe accounts;
+----------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------------+--------------+------+-----+---------+-------+
| acct_num | int(11) | NO | PRI | NULL | |
| acct_create_dt | datetime | NO | | NULL | |
| acct_close_dt | datetime | YES | | NULL | |
| first_name | varchar(255) | NO | | NULL | |
| last_name | varchar(255) | NO | | NULL | |
| address | varchar(255) | NO | | NULL | |
| city | varchar(255) | NO | | NULL | |
| state | varchar(255) | NO | | NULL | |
| zipcode | varchar(255) | NO | | NULL | |
| phone_number | varchar(255) | NO | | NULL | |
| created | datetime | NO | | NULL | |
| modified | datetime | NO | | NULL | |
+----------------+--------------+------+-----+---------+-------+
12 rows in set (0.00 sec)
mysql> describe accountdevice;
+-------------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| account_id | int(11) | NO | MUL | NULL | |
| device_id | int(11) | NO | MUL | NULL | |
| activation_date | datetime | NO | | NULL | |
| account_device_id | varchar(255) | NO | | NULL | |
+-------------------+--------------+------+-----+---------+----------------+
I am getting the below exception
SQLSyntaxErrorException: Unknown column 't1.acct_num' in 'field list'
16/10/15 13:48:12 ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Unknown column 't1.acct_num' in 'field list'
Can someone help ?
This is the expected behavior.
Firstly sqoop will fetch metadata (column details) from RDBMS based on your query.
Using query:
select accounts.first_name FROM accounts JOIN accountdevice
ON (accounts.acct_num = accountdevice.account_id) WHERE 1 = 0
you see $CONDITIINS
is replaced with 1 = 0 to fetch metadata.
Now your query will return only 1 column first_name
and you are splitting on acct_num
which is not queried from RDBMS table. That's why you are getting Unknown column error.
So make sure you SELECT split by column too in your SQL query.